INDEX
Explanations
expressions related to punishment and aggression
New Auto-Interp
Negative Logits
fel
-0.15
è²»
-0.15
ToStr
-0.15
Kens
-0.14
209
-0.14
411
-0.14
ochen
-0.14
Ladies
-0.14
yn
-0.14
lad
-0.13
POSITIVE LOGITS
_operand
0.18
uger
0.15
üz
0.15
ìĺģ
0.15
rawer
0.14
ugas
0.14
jez
0.14
priest
0.14
odom
0.13
utenberg
0.13
Activations Density 0.258%