INDEX
Explanations
phrases related to legal proceedings and criminal activities
New Auto-Interp
Negative Logits
iframe
-0.69
selves
-0.68
lly
-0.66
stuff
-0.61
voice
-0.61
emo
-0.61
pointers
-0.60
lot
-0.60
rejoice
-0.59
uces
-0.59
POSITIVE LOGITS
captivity
1.36
prison
1.19
prison
1.11
patient
1.06
jail
0.99
isolation
0.98
custody
0.97
exile
0.96
remission
0.93
activity
0.93
Activations Density 0.081%