INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     interesante
    0.57
     légèrement
    0.56
     importante
    0.54
     vikt
    0.53
     importantes
    0.51
     Slightly
    0.49
     fontos
    0.47
     importants
    0.47
     slightly
    0.47
     কিছুটা
    0.47
    POSITIVE LOGITS
    简直
    1.11
     absolutely
    1.02
     literally
    0.97
     absolutamente
    0.96
     truly
    0.96
     practically
    0.93
     rival
    0.91
     буквально
    0.91
     utterly
    0.91
     rivals
    0.88
    Act Density 0.092%

    No Known Activations