INDEX
    Explanations

    the use of the word "rather" to express preference or contrast

    New Auto-Interp
    Negative Logits
    sf
    -0.16
    system
    -0.15
    onders
    -0.15
    sse
    -0.14
    sl
    -0.14
    sm
    -0.14
    ussen
    -0.14
    swer
    -0.14
    imizer
    -0.14
    ital
    -0.14
    POSITIVE LOGITS
     than
    0.37
    -than
    0.30
    than
    0.28
    _than
    0.27
     THAN
    0.25
    Than
    0.24
     Than
    0.22
     než
    0.22
     quam
    0.20
     än
    0.19
    Act Density 0.015%

    No Known Activations