INDEX
Explanations
phrases related to incarceration or confinement
terms related to prisoners and their conditions
New Auto-Interp
Negative Logits
drive
-0.78
UTH
-0.65
orp
-0.65
ilde
-0.64
lag
-0.63
dust
-0.63
Boll
-0.63
Pillar
-0.63
Ples
-0.62
orem
-0.62
POSITIVE LOGITS
incarcerated
1.03
sentenced
1.01
freed
0.90
housed
0.89
inmates
0.88
jailed
0.85
prisoners
0.84
inmate
0.81
prisoner
0.80
confinement
0.79
Activations Density 0.053%