INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
ulo
-0.16
iss
-0.15
-0.15
-bodied
-0.15
zin
-0.15
iability
-0.15
led
-0.15
lessly
-0.15
aska
-0.14
zan
-0.14
POSITIVE LOGITS
errat
0.23
(mon
0.19
oton
0.19
gomery
0.19
serrat
0.18
roe
0.18
.Mon
0.17
tréal
0.17
иÑĤоÑĢ
0.17
stery
0.17
Activations Density 0.050%