INDEX
Explanations
proper nouns, specifically names of people and authors within academic contexts
New Auto-Interp
Negative Logits
nore
-0.16
_sdk
-0.16
ordum
-0.15
-ci
-0.15
óż
-0.15
adors
-0.14
umba
-0.14
ανά
-0.14
966
-0.14
ureen
-0.14
POSITIVE LOGITS
Jim
0.20
Jim
0.18
River
0.17
Merch
0.16
rame
0.16
Ace
0.16
Uri
0.15
Lunar
0.15
River
0.15
Fern
0.15
Activations Density 0.066%