INDEX
Explanations
proper nouns, specifically names of individuals or organizations
New Auto-Interp
Negative Logits
asco
-0.20
ynes
-0.15
athi
-0.15
likes
-0.15
anes
-0.15
richt
-0.15
æ»
-0.15
iral
-0.14
Yen
-0.14
aries
-0.14
POSITIVE LOGITS
ong
0.29
angling
0.28
eng
0.27
ang
0.26
aoke
0.26
ulong
0.26
ao
0.26
uan
0.25
ongyang
0.24
uling
0.24
Activations Density 0.042%