INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     important
    -1.80
    important
    -1.77
     importante
    -1.48
     belangrijke
    -1.42
     importanti
    -1.41
     Important
    -1.40
    importante
    -1.38
     importantes
    -1.34
    Important
    -1.34
     belangrijk
    -1.34
    POSITIVE LOGITS
     and
    0.60
     but
    0.55
     enough
    0.54
     theoretical
    0.52
     social
    0.52
     empirical
    0.49
    tmpl
    0.49
    lyde
    0.48
    сторо
    0.47
     clinical
    0.47
    Act Density 0.154%

    No Known Activations