INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
_singleton
-0.17
rál
-0.15
mah
-0.15
à¸Ļะ
-0.15
zac
-0.14
Sinclair
-0.14
olini
-0.14
ÑĢÑĥÑĤ
-0.14
yum
-0.14
ná
-0.14
POSITIVE LOGITS
laus
0.32
law
0.29
lav
0.28
ÅĤaw
0.28
isl
0.28
лав
0.25
islav
0.23
law
0.21
sl
0.21
lava
0.21
Activations Density 0.029%