INDEX
    Explanations

    describing degree of feeling

    New Auto-Interp
    Negative Logits
     stupid
    0.56
     stupidity
    0.51
     filth
    0.49
     outrageous
    0.49
     falsch
    0.49
     shitty
    0.46
     पागल
    0.43
     Stupid
    0.42
    错了
    0.41
     гря
    0.41
    POSITIVE LOGITS
     somewhat
    1.66
     Somewhat
    1.31
     quite
    1.30
     rather
    1.28
     bastante
    1.23
     довольно
    1.23
     agak
    1.22
    有點
    1.16
     piuttosto
    1.15
    1.13
    Act Density 0.073%

    No Known Activations