INDEX
    Explanations

    expressions of gratitude or refusal

    New Auto-Interp
    Negative Logits
    éłĨ
    -0.15
    hw
    -0.15
    bjerg
    -0.14
    soft
    -0.14
    ulado
    -0.14
    ysa
    -0.14
    acro
    -0.14
    ITT
    -0.14
    itt
    -0.13
    bast
    -0.13
    POSITIVE LOGITS
     thank
    0.29
    Thank
    0.23
    thank
    0.23
    pref
    0.23
     Thank
    0.22
    è°¢
    0.21
     preference
    0.19
     prefer
    0.18
     THANK
    0.17
    prefer
    0.17
    Act Density 0.178%

    No Known Activations