INDEX
Explanations
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
isman
-0.17
roker
-0.17
cerer
-0.15
ický
-0.15
salesman
-0.15
esting
-0.15
nicos
-0.15
_consts
-0.14
eus
-0.14
offer
-0.14
POSITIVE LOGITS
abeth
0.24
ie
0.23
atrice
0.22
ianne
0.22
ina
0.22
ette
0.21
ika
0.21
adora
0.21
anna
0.21
izabeth
0.20
Activations Density 0.245%