INDEX
Explanations
proper nouns, particularly names of people
female names and surnames
New Auto-Interp
Negative Logits
cessite
-0.31
foglal
-0.31
leyebilirsiniz
-0.29
Ră
-0.28
מק
-0.27
ække
-0.27
æl
-0.26
`<
-0.26
ാല
-0.25
wrong
-0.25
POSITIVE LOGITS
Olga
1.13
Sonia
1.07
Tanya
1.04
Lydia
1.03
Wanda
1.02
Nadia
1.01
Elsa
1.00
Olga
1.00
Ingrid
0.98
Miriam
0.95
Activations Density 0.006%