INDEX
Explanations
mention of violent actions, particularly stabbings
terms related to stabbing incidents
New Auto-Interp
Negative Logits
mberg
-0.78
Administ
-0.69
Organization
-0.69
XM
-0.68
Cosmos
-0.68
Leban
-0.66
oS
-0.65
administ
-0.65
tz
-0.64
BuyableInstoreAndOnline
-0.63
POSITIVE LOGITS
lished
1.09
wounds
1.00
slit
0.89
stab
0.86
spree
0.84
rampage
0.82
nesday
0.79
stabbing
0.79
dagger
0.78
wound
0.77
Activations Density 0.030%