INDEX
Explanations
terms related to the prison system and incarceration
New Auto-Interp
Negative Logits
ado
-0.16
ulle
-0.15
gentlemen
-0.15
sik
-0.15
-eslint
-0.15
UDGE
-0.14
rim
-0.14
imprisonment
-0.14
prisons
-0.14
unched
-0.14
POSITIVE LOGITS
ers
0.28
house
0.23
cells
0.22
/release
0.22
sentence
0.21
-cell
0.20
camps
0.20
sentences
0.20
camp
0.20
term
0.20
Activations Density 0.030%