INDEX
Explanations
mentions of crime-related words or terms
words related to crime and criminal activity
New Auto-Interp
Negative Logits
lihood
-0.84
Centauri
-0.82
VIDEO
-0.72
FORM
-0.71
lists
-0.69
#$
-0.68
zl
-0.66
Annex
-0.64
hof
-0.63
Oo
-0.62
POSITIVE LOGITS
inals
1.00
eware
0.92
inally
0.89
inatory
0.88
ming
0.87
etric
0.86
acies
0.85
inant
0.82
bers
0.81
orah
0.81
Activations Density 0.029%