INDEX
Explanations
names of individuals
proper nouns, particularly names
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.90
sburgh
-0.78
è¦ļéĨĴ
-0.67
BILITY
-0.66
ä½ľ
-0.64
butterfly
-0.64
BILITIES
-0.63
ENCE
-0.62
ments
-0.62
ITED
-0.61
POSITIVE LOGITS
igans
1.08
ghai
1.08
amar
1.06
kees
1.04
amo
1.00
seys
0.99
agos
0.98
atta
0.96
amia
0.95
allo
0.93
Activations Density 0.033%