INDEX
Explanations
references to inmates or prison-related terms
mentions of inmates and corrections-related terms
New Auto-Interp
Negative Logits
rous
-0.73
ãĤ¤ãĥĪ
-0.71
ãĥ¼ãĥ³
-0.70
ku
-0.68
drive
-0.67
ãĥ¼ãĥ«
-0.67
Nare
-0.67
orically
-0.66
efully
-0.66
issan
-0.66
POSITIVE LOGITS
inmates
0.93
inmate
0.92
icts
0.91
iaries
0.90
iary
0.85
Facility
0.75
incarcerated
0.72
Correctional
0.72
correctional
0.70
arians
0.70
Activations Density 0.026%