INDEX
Explanations
words related to violent acts or attacks, particularly involving explosive devices
mentions of bombing
New Auto-Interp
Negative Logits
laus
-0.97
learn
-0.82
eva
-0.79
upload
-0.74
Lear
-0.71
ITY
-0.70
dit
-0.69
Lean
-0.67
BOOK
-0.67
mia
-0.67
POSITIVE LOGITS
bombing
1.31
bombings
1.20
spree
1.10
raids
1.09
bomber
1.04
bombers
0.99
barr
0.97
bombard
0.88
raid
0.88
bombardment
0.85
Activations Density 0.013%