INDEX
Explanations
mentions of being imprisoned or serving time behind bars
mentions of "bars," relating to confinement or places where people gather
New Auto-Interp
Negative Logits
ctive
-0.91
lihood
-0.75
IBLE
-0.74
sie
-0.73
EngineDebug
-0.71
UAL
-0.70
ALLY
-0.69
åĬ
-0.68
ULAR
-0.67
GENERAL
-0.66
POSITIVE LOGITS
hops
1.07
bars
1.04
poon
1.03
hop
1.02
manship
0.97
becue
0.95
itone
0.90
mith
0.89
bell
0.86
bars
0.86
Activations Density 0.005%