INDEX
Explanations
mentions of acts of violence or deaths of individuals
occurrences of the word "killed" related to violent events or deaths
New Auto-Interp
Negative Logits
rium
-0.81
DragonMagazine
-0.79
idth
-0.76
ĨĴ
-0.71
Cola
-0.67
users
-0.63
SPONSORED
-0.60
Applic
-0.60
Factor
-0.60
Club
-0.60
POSITIVE LOGITS
by
0.92
spree
0.89
fighting
0.83
unarmed
0.80
tragically
0.79
rampage
0.78
wounding
0.74
fighting
0.74
instantly
0.74
innoc
0.74
Activations Density 0.067%