INDEX
Explanations
phrases related to violence and conflict
descriptors related to violence and conflict
New Auto-Interp
Negative Logits
natureconservancy
-0.78
paren
-0.77
pring
-0.72
uart
-0.71
ource
-0.71
abase
-0.71
arten
-0.70
earcher
-0.69
leans
-0.69
irmation
-0.67
POSITIVE LOGITS
havoc
0.98
retribution
0.94
assaults
0.93
harassing
0.92
murders
0.92
murdering
0.92
vicious
0.89
intimidation
0.89
treason
0.88
thugs
0.88
Activations Density 0.979%