INDEX
Explanations
expressions of anger or criticism
instances of strong emotional expressions, particularly rants and tirades
New Auto-Interp
Negative Logits
undai
-0.96
earances
-0.77
acquisitions
-0.75
licts
-0.71
prus
-0.69
emale
-0.68
phis
-0.66
metics
-0.65
ppo
-0.64
rity
-0.63
POSITIVE LOGITS
tir
1.00
rant
0.95
aloud
0.86
loudly
0.81
spew
0.74
angrily
0.73
against
0.73
louder
0.72
goodbye
0.72
uttered
0.70
Activations Density 0.038%