INDEX
Explanations
mentions of violence
words related to various forms and contexts of violence
New Auto-Interp
Negative Logits
ocular
-0.84
missions
-0.84
ramer
-0.77
odes
-0.76
sonian
-0.76
ergy
-0.74
gres
-0.74
arton
-0.74
dit
-0.72
ership
-0.71
POSITIVE LOGITS
perpetrated
1.01
violence
0.93
inflicted
0.91
suppression
0.82
Violence
0.78
repression
0.77
quit
0.74
prevention
0.73
oppression
0.72
spree
0.71
Activations Density 0.023%