INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bonus
    -0.09
    Bonus
    -0.08
     pound
    -0.08
     while
    -0.07
     enquanto
    -0.07
     nano
    -0.07
    -mini
    -0.07
     Unterhaltung
    -0.07
     mientras
    -0.07
    非常
    -0.07
    POSITIVE LOGITS
     Formação
    0.08
     imprensa
    0.08
     Atua
    0.08
     enfrent
    0.08
     retom
    0.08
     המקום
    0.08
    ifizieren
    0.08
     Ordered
    0.08
     Contudo
    0.08
    ’ad
    0.08
    Act Density 0.001%

    No Known Activations