INDEX
Explanations
mentions of sentences with legal consequences, specifically focusing on prison sentences
mentions of prison sentences and related legal consequences
New Auto-Interp
Negative Logits
xit
-0.78
tick
-0.70
ween
-0.70
amins
-0.67
endor
-0.67
Birth
-0.66
flush
-0.64
alter
-0.63
LAN
-0.63
yip
-0.63
POSITIVE LOGITS
sentences
0.95
sentence
0.89
jail
0.82
jailed
0.81
detain
0.80
imprisonment
0.80
sentenced
0.79
rehabilitation
0.77
agy
0.76
custody
0.75
Activations Density 0.035%