INDEX
Explanations
names and surnames, particularly those with specific endings
New Auto-Interp
Negative Logits
amer
-0.17
leton
-0.17
ermann
-0.17
ichi
-0.16
emer
-0.15
.sul
-0.15
ibal
-0.15
Mutation
-0.14
alu
-0.14
ponce
-0.14
POSITIVE LOGITS
rosso
0.15
hale
0.14
VID
0.14
.simps
0.14
rub
0.14
Tut
0.14
bud
0.13
Aub
0.13
sơ
0.13
++.
0.13
Activations Density 0.004%