INDEX
Explanations
violent actions and experiences related to oppression and abuse
New Auto-Interp
Negative Logits
chal
-0.15
ê²°
-0.14
compos
-0.14
achable
-0.14
_except
-0.14
distracting
-0.14
chia
-0.13
?url
-0.13
assass
-0.13
hij
-0.13
POSITIVE LOGITS
torture
0.40
Tort
0.39
beat
0.39
TORT
0.39
tort
0.39
beating
0.33
beat
0.31
abuse
0.30
beaten
0.30
physical
0.29
Activations Density 0.197%