INDEX
Explanations
references to violence
words related to physical aggression and conflict
New Auto-Interp
Negative Logits
ocular
-0.87
missions
-0.83
sonian
-0.76
ership
-0.74
odes
-0.74
ergy
-0.73
ramer
-0.72
gres
-0.71
alty
-0.71
ITNESS
-0.70
POSITIVE LOGITS
perpetrated
1.00
inflicted
0.91
violence
0.89
prevention
0.79
Violence
0.78
suppression
0.76
quit
0.73
fighting
0.73
repression
0.72
fully
0.71
Activations Density 0.026%