INDEX
Explanations
references to incarceration and the criminal justice system
New Auto-Interp
Negative Logits
urally
-0.16
kiá»ĩn
-0.15
reff
-0.15
ÑģÑĤи
-0.14
elp
-0.14
oca
-0.14
hin
-0.14
fried
-0.13
ذÙĩ
-0.13
policing
-0.13
POSITIVE LOGITS
iron
0.17
bars
0.16
house
0.16
camps
0.16
uitka
0.16
umba
0.16
cells
0.16
bound
0.15
LW
0.15
-cell
0.15
Activations Density 0.028%