INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     designers
    -0.08
     selectively
    -0.08
    .fn
    -0.08
     völlig
    -0.07
     buscan
    -0.07
     scouts
    -0.07
     showcases
    -0.07
    -0.07
     πρώτο
    -0.07
    -0.07
    POSITIVE LOGITS
    _correct
    0.09
     adolescente
    0.09
     correta
    0.09
    _success
    0.09
    Correct
    0.09
     correcta
    0.09
     Correct
    0.09
     Adolesc
    0.08
    正确
    0.08
     correg
    0.08
    Act Density 0.004%

    No Known Activations