INDEX
Explanations
mentions of victims in various contexts
references to victims of various forms of violence or abuse
New Auto-Interp
Negative Logits
arily
-0.71
hed
-0.71
heads
-0.70
liness
-0.70
ortment
-0.70
ahead
-0.69
susp
-0.68
hey
-0.68
ulum
-0.67
HK
-0.66
POSITIVE LOGITS
injustice
0.96
violence
0.91
vandalism
0.87
persecution
0.87
abuse
0.86
discrimination
0.85
oppression
0.85
arson
0.84
crime
0.84
tyranny
0.81
Activations Density 0.089%