INDEX
Explanations
mentions of prisons or related terms like prison sentences
references to prisons and the prison system
New Auto-Interp
Negative Logits
yip
-0.95
lass
-0.79
ï¸ı
-0.79
thora
-0.77
udden
-0.74
omatic
-0.71
rians
-0.70
rian
-0.69
::::::::
-0.68
witz
-0.67
POSITIVE LOGITS
inmates
0.93
prisons
0.86
inmate
0.84
prison
0.82
prison
0.80
barr
0.79
sentences
0.78
gul
0.78
confinement
0.76
camps
0.75
Activations Density 0.026%