INDEX
Explanations
gender-specific pronouns and references to female characters
New Auto-Interp
Negative Logits
hani
-0.18
hausen
-0.15
dorf
-0.15
ibraltar
-0.15
endi
-0.14
Affero
-0.14
šet
-0.14
kle
-0.14
chner
-0.14
986
-0.14
POSITIVE LOGITS
anship
0.17
vsp
0.16
Mes
0.15
{[0.14
ulp
0.14
ior
0.13
FK
0.13
sak
0.13
à¥ĥत
0.13
.Navigation
0.13
Activations Density 0.417%