INDEX
Explanations
phrases related to being in prison or behind bars
references to incarceration or imprisonment
New Auto-Interp
Negative Logits
ctive
-1.05
ULAR
-0.80
aneous
-0.78
actus
-0.72
ulously
-0.72
ples
-0.71
GENERAL
-0.70
itional
-0.69
RECT
-0.69
vier
-0.68
POSITIVE LOGITS
manship
0.93
hips
0.89
cape
0.87
hift
0.85
bars
0.84
eters
0.82
poons
0.80
mith
0.78
hirt
0.78
poon
0.76
Activations Density 0.034%