INDEX
Explanations
references to crime and criminal activity
New Auto-Interp
Negative Logits
ointment
-0.18
tle
-0.17
tems
-0.17
mund
-0.16
arian
-0.15
nev
-0.15
rais
-0.15
stood
-0.15
tlement
-0.14
undry
-0.14
POSITIVE LOGITS
Scene
0.19
spree
0.18
scene
0.17
ully
0.16
committed
0.16
olvers
0.16
cene
0.16
against
0.16
ythe
0.15
scenes
0.15
Activations Density 0.018%