INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
opus
-0.15
ienes
-0.15
ianne
-0.15
ieres
-0.15
ocht
-0.14
peer
-0.14
mys
-0.14
VÄĽ
-0.14
blas
-0.14
iere
-0.14
POSITIVE LOGITS
ovic
0.29
iÄĩ
0.24
ic
0.23
Serbian
0.21
Drag
0.20
Äĩ
0.20
Milo
0.20
olic
0.20
Mil
0.20
acic
0.20
Activations Density 0.022%