INDEX
Explanations
mentions of legal punishment, specifically sentences involving prison time
mentions of prison sentences
New Auto-Interp
Negative Logits
endor
-0.82
soDeliveryDate
-0.80
ãĤ¤ãĥĪ
-0.75
yip
-0.74
xy
-0.69
rians
-0.69
alter
-0.68
ãĥ¼ãĥĨãĤ£
-0.68
ãĤ¡
-0.67
Discord
-0.66
POSITIVE LOGITS
sentences
0.98
sentence
0.97
imprisonment
0.90
convictions
0.80
imprison
0.79
jailed
0.79
manslaughter
0.77
agy
0.76
inmates
0.75
bail
0.75
Activations Density 0.020%