INDEX
Explanations
proper nouns or titles, particularly related to political figures or entities
New Auto-Interp
Negative Logits
verts
-0.16
ihn
-0.15
ound
-0.15
амп
-0.14
Pam
-0.14
ogie
-0.14
keh
-0.14
oundary
-0.14
lix
-0.14
ete
-0.14
POSITIVE LOGITS
äs
0.34
ä
0.26
äsent
0.25
inz
0.25
unks
0.23
unk
0.23
akt
0.22
isen
0.21
aes
0.21
äh
0.20
Activations Density 0.006%