INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
uzzi
-0.15
Ñĥма
-0.14
re
-0.13
ura
-0.13
alles
-0.13
(er
-0.13
Niet
-0.13
zie
-0.12
#
-0.12
ê»
-0.12
POSITIVE LOGITS
-desktop
0.15
ecies
0.15
CTX
0.15
Pear
0.14
â̦↵↵
0.14
andi
0.14
Fauc
0.14
ï½¢
0.13
owy
0.13
chine
0.13
Activations Density 0.372%