INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     something
    -2.41
    something
    -2.27
    Something
    -1.96
    someone
    -1.95
     someone
    -1.93
     Something
    -1.88
     qualcosa
    -1.73
     SOMETHING
    -1.72
     somebody
    -1.70
    somebody
    -1.65
    POSITIVE LOGITS
     else
    0.99
     Else
    0.68
    WEBPACK
    0.66
    '
    0.65
    Else
    0.64
     ELSE
    0.64
    0.63
     els
    0.59
     el
    0.57
     же
    0.56
    Act Density 1.570%

    No Known Activations