INDEX
Explanations
terms related to violence and actions to prevent it
mentions of violence and its implications or contexts
New Auto-Interp
Negative Logits
ocular
-0.88
sonian
-0.82
odes
-0.75
missions
-0.74
dit
-0.74
osition
-0.73
alty
-0.71
arton
-0.71
ergy
-0.71
ership
-0.69
POSITIVE LOGITS
perpetrated
1.09
inflicted
0.98
violence
0.85
fighting
0.80
quit
0.80
Viol
0.77
against
0.76
erupted
0.76
Violence
0.75
hell
0.73
Activations Density 0.044%