INDEX
Explanations
references to violent acts and crime
topics related to violence and human rights abuses
New Auto-Interp
Negative Logits
Pebble
-0.87
Canter
-0.85
ibilities
-0.81
irement
-0.75
Flex
-0.74
Premium
-0.74
ellen
-0.74
hao
-0.73
Apple
-0.72
Compass
-0.70
POSITIVE LOGITS
massacres
1.84
atroc
1.72
massacre
1.70
atrocities
1.66
genocide
1.60
killings
1.55
slaughter
1.53
murderers
1.51
murders
1.51
gruesome
1.50
Activations Density 1.078%