INDEX
Explanations
references to violent actions or crimes involving death
New Auto-Interp
Negative Logits
BuyableInstoreAndOnline
-0.82
wcsstore
-0.73
Cola
-0.70
Collider
-0.70
Scot
-0.70
por
-0.67
Bubble
-0.65
cession
-0.64
oday
-0.64
ertain
-0.63
POSITIVE LOGITS
spree
1.05
switch
0.88
mails
0.87
rampage
0.87
joy
0.83
innocent
0.82
civilians
0.82
murderer
0.81
murdering
0.81
slain
0.79
Activations Density 0.628%