INDEX
Explanations
words related to violent actions or situations
references to violence
New Auto-Interp
Negative Logits
ocular
-0.84
erella
-0.78
sonian
-0.78
printed
-0.78
heit
-0.75
STER
-0.73
acent
-0.73
ļéĨĴ
-0.72
artments
-0.71
regon
-0.71
POSITIVE LOGITS
violence
0.92
retribution
0.85
assault
0.85
deaths
0.83
acre
0.82
clashes
0.81
jihad
0.80
gang
0.80
suppression
0.79
scenes
0.79
Activations Density 0.020%