INDEX
Explanations
phrases related to acts of violence or conflict
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.99
redes
-0.78
bet
-0.74
spons
-0.73
soType
-0.72
HUD
-0.71
Ü
-0.70
cano
-0.68
aird
-0.67
algia
-0.65
POSITIVE LOGITS
unsuspecting
1.31
innocent
1.05
civilians
1.00
unarmed
0.95
targets
0.93
anyone
0.89
him
0.86
opponents
0.85
anybody
0.84
somebody
0.83
Activations Density 1.695%