INDEX
Explanations
proper nouns related to jail or correctional facilities
references to specific individuals and locations associated with incarceration
New Auto-Interp
Negative Logits
arted
-0.80
Cumber
-0.80
dred
-0.72
trop
-0.72
trop
-0.70
Chero
-0.69
olition
-0.69
partisans
-0.67
Brook
-0.67
ply
-0.64
POSITIVE LOGITS
Sz
1.70
Ign
1.69
Jail
1.56
Ign
1.19
conn
1.02
Paste
0.99
Lab
0.98
****
0.96
Alvarez
0.96
Liam
0.94
Activations Density 0.046%