INDEX
Explanations
proper nouns, specifically those related to political figures or events
occurrences of the name "Paul."
New Auto-Interp
Negative Logits
XY
-0.75
Unt
-0.73
nond
-0.67
predictable
-0.67
notes
-0.64
controls
-0.63
Answer
-0.62
peripher
-0.62
clock
-0.62
Omega
-0.61
POSITIVE LOGITS
aul
4.50
haul
1.19
caul
1.14
Maul
1.13
ael
1.05
ault
1.04
aur
0.99
au
0.97
alan
0.96
ulk
0.95
Activations Density 0.020%