INDEX
Explanations
mentions of violent acts, particularly bombings
references to bombing incidents or campaigns
New Auto-Interp
Negative Logits
laus
-0.93
videos
-0.75
ITY
-0.72
learn
-0.72
MpServer
-0.71
galitarian
-0.70
dit
-0.70
los
-0.69
FINE
-0.69
olars
-0.69
POSITIVE LOGITS
bombing
1.16
bombings
0.99
bomber
0.96
raids
0.95
spree
0.89
barr
0.89
bombers
0.88
bombard
0.87
shelter
0.83
shelters
0.83
Activations Density 0.024%