INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
mys
-0.15
½
-0.15
ůr
-0.14
ieres
-0.14
Via
-0.14
Via
-0.14
ckt
-0.14
ÅĤo
-0.13
iere
-0.13
iron
-0.13
POSITIVE LOGITS
ovic
0.31
iÄĩ
0.26
ivic
0.25
Milo
0.25
ic
0.25
Mil
0.22
Äij
0.22
Petro
0.22
Cv
0.21
Äĩ
0.21
Activations Density 0.028%