INDEX
Explanations
proper nouns or words that seem to be nonsense
occurrences of the substring "um" in various forms
New Auto-Interp
Negative Logits
blackout
-0.66
sober
-0.60
strawberries
-0.57
stranger
-0.57
lure
-0.56
Cuomo
-0.56
cutoff
-0.55
Barney
-0.54
heater
-0.54
wool
-0.53
POSITIVE LOGITS
pty
1.30
ming
1.25
mers
1.18
ulative
1.11
brance
1.04
vir
1.03
etric
1.02
pt
0.98
acher
0.97
mary
0.97
Activations Density 0.057%