INDEX
Explanations
pronouns associated with male individuals
New Auto-Interp
Negative Logits
iciel
-0.15
umper
-0.15
-webpack
-0.15
ZemÄĽ
-0.14
è»
-0.14
WEEN
-0.14
æĿ
-0.14
\base
-0.13
otec
-0.13
deniz
-0.13
POSITIVE LOGITS
/her
0.20
or
0.18
/she
0.17
.her
0.16
idi
0.15
éľ
0.14
gol
0.14
ãĤ·ãĥ£
0.14
Richards
0.14
123
0.14
Activations Density 0.111%