INDEX
Explanations
references to violent acts and criminal incidents
phrases related to crime and violence
New Auto-Interp
Negative Logits
(#
-0.63
Factor
-0.62
ist
-0.61
Minion
-0.60
osaurus
-0.60
dict
-0.59
ouble
-0.58
iel
-0.57
otally
-0.55
toc
-0.55
POSITIVE LOGITS
agher
0.92
eating
0.69
rawdownloadcloneembedreportprint
0.67
swick
0.66
opian
0.66
Britain
0.64
accompan
0.64
mercial
0.62
England
0.61
Blackburn
0.61
Activations Density 0.128%