INDEX
Explanations
references to prisons and correctional systems
New Auto-Interp
Negative Logits
ุà¹Ī
-0.15
ëł¹
-0.14
_sampler
-0.14
aar
-0.14
_Device
-0.13
utr
-0.13
аÑĤи
-0.13
éric
-0.13
sublic
-0.13
æ»ħ
-0.13
POSITIVE LOGITS
prison
0.76
Prison
0.68
prisoner
0.65
prisoners
0.63
jail
0.61
inmate
0.61
prisons
0.59
inmates
0.57
Jail
0.53
Correction
0.50
Activations Density 0.317%