INDEX
Explanations
proper nouns and geographical locations
New Auto-Interp
Negative Logits
åŁİå¸Ĥ
-0.15
Cities
-0.14
éĥ½å¸Ĥ
-0.14
ег
-0.14
ég
-0.14
HEL
-0.13
leys
-0.13
less
-0.13
ML
-0.13
vivo
-0.13
POSITIVE LOGITS
shire
0.28
ensis
0.28
ian
0.22
iens
0.21
-based
0.19
ians
0.18
-born
0.17
Slim
0.17
erdale
0.16
-area
0.16
Activations Density 0.223%