INDEX
Explanations
phrases related to prisons or incarceration
references to prisons and the prison system
New Auto-Interp
Negative Logits
yip
-0.98
lass
-0.77
thora
-0.70
rians
-0.70
ï¸ı
-0.70
omatic
-0.69
rian
-0.69
udden
-0.69
witz
-0.66
zinski
-0.65
POSITIVE LOGITS
inmates
0.95
prisons
0.85
inmate
0.85
sentences
0.84
confinement
0.79
prison
0.78
barr
0.78
prisoners
0.77
prison
0.77
house
0.76
Activations Density 0.030%