INDEX
Explanations
terms related to incarceration and the prison system
New Auto-Interp
Negative Logits
大åħ¨
-0.16
สำà¸Ļ
-0.16
ado
-0.14
yx
-0.14
urally
-0.14
kiá»ĩn
-0.14
ibel
-0.14
906
-0.14
sik
-0.14
policing
-0.13
POSITIVE LOGITS
house
0.20
ers
0.19
ieri
0.19
nier
0.18
overcrow
0.18
planet
0.17
cells
0.17
ors
0.17
break
0.16
bars
0.16
Activations Density 0.023%