INDEX
Explanations
referential phrases related to anger and anger management
New Auto-Interp
Negative Logits
esser
-0.16
ancers
-0.15
ected
-0.14
itals
-0.14
lider
-0.14
ást
-0.14
Bomb
-0.14
/lab
-0.14
bombs
-0.14
Bomb
-0.14
POSITIVE LOGITS
ulent
0.18
Amp
0.14
ulance
0.14
imir
0.14
üh
0.14
mittel
0.14
?}",
0.14
lius
0.14
th
0.13
one
0.13
Activations Density 0.037%