INDEX
Explanations
words related to strong negative emotions or reactions
expressions of anger or frustration
New Auto-Interp
Negative Logits
acea
-0.88
erva
-0.85
cius
-0.82
oak
-0.81
arette
-0.78
querque
-0.77
win
-0.77
elia
-0.76
pty
-0.75
ynski
-0.75
POSITIVE LOGITS
idious
0.74
ultras
0.70
Furious
0.68
Attacks
0.64
Dug
0.64
ãĥ£
0.62
quished
0.61
icago
0.61
furious
0.61
err
0.60
Activations Density 0.040%