INDEX
Explanations
words related to violent actions like assassination, hacking, bombing, and killing
references to acts of violence and targeted attacks
New Auto-Interp
Negative Logits
Cola
-0.84
izont
-0.78
itte
-0.78
mie
-0.76
arine
-0.75
Moder
-0.73
ellar
-0.72
BuyableInstoreAndOnline
-0.69
ertain
-0.69
ube
-0.69
POSITIVE LOGITS
spree
1.40
rampage
1.06
allegation
1.00
scandal
0.99
attempt
0.99
frenzy
0.97
accusation
0.95
affair
0.91
massacre
0.88
of
0.87
Activations Density 0.224%