INDEX
Explanations
proper nouns, especially names of individuals and organizations
New Auto-Interp
Negative Logits
atown
-0.19
stown
-0.17
platz
-0.15
etwork
-0.15
abbo
-0.15
olg
-0.15
Verse
-0.15
lingen
-0.15
lymp
-0.15
ynos
-0.15
POSITIVE LOGITS
man
0.20
disadv
0.16
veis
0.15
ìĦł
0.15
/Library
0.15
MAN
0.14
handler
0.14
質
0.14
III
0.14
ман
0.14
Activations Density 0.215%