INDEX
    Explanations

    the word "rather" and its variants, indicating a focus on expressing preferences or comparisons

    New Auto-Interp
    Negative Logits
    swer
    -0.17
    ys
    -0.14
    itals
    -0.14
     al
    -0.14
    system
    -0.14
    ateg
    -0.14
    entre
    -0.14
    chg
    -0.14
    aneous
    -0.14
    ray
    -0.14
    POSITIVE LOGITS
     than
    0.30
    than
    0.22
    _than
    0.21
    -than
    0.21
    Than
    0.21
     THAN
    0.20
     než
    0.20
    -ÑĤаки
    0.20
     Than
    0.18
     quam
    0.18
    Act Density 0.014%

    No Known Activations