INDEX
Explanations
references to murder and related violent crimes
New Auto-Interp
Negative Logits
laÅŁ
-0.15
lage
-0.14
å·ŀ
-0.14
izu
-0.14
itzer
-0.14
ERA
-0.14
suming
-0.14
åĦ¿
-0.14
uplic
-0.14
stanov
-0.13
POSITIVE LOGITS
-death
0.16
-su
0.15
anova
0.15
ously
0.15
ous
0.15
spree
0.14
Scenes
0.14
stk
0.14
auer
0.14
greg
0.14
Activations Density 0.021%