INDEX
Explanations
references to terrorist attacks
references to violent incidents or acts of aggression
New Auto-Interp
Negative Logits
OVA
-0.66
YC
-0.66
Vale
-0.65
ETA
-0.65
Wait
-0.65
view
-0.64
66666666
-0.64
Vide
-0.63
VAL
-0.63
tz
-0.62
POSITIVE LOGITS
attacks
1.24
attack
1.15
attacks
1.13
attack
1.05
Attacks
0.99
attackers
0.99
Attack
0.89
assaults
0.86
spree
0.81
CVE
0.78
Activations Density 0.025%