INDEX
Explanations
proper nouns, particularly names of places and cities
New Auto-Interp
Negative Logits
etler
-0.18
iddi
-0.15
oner
-0.14
mür
-0.14
Rooney
-0.14
–and
-0.14
uard
-0.13
lies
-0.13
ierge
-0.13
ozor
-0.13
POSITIVE LOGITS
sville
0.26
neapolis
0.26
apolis
0.26
ville
0.25
åı¤å±ĭ
0.25
Hague
0.24
stown
0.24
adelphia
0.24
-town
0.22
zhou
0.22
Activations Density 0.585%