INDEX
Explanations
pronouns that denote gender and quantify representation
New Auto-Interp
Negative Logits
member
-0.18
riter
-0.16
uben
-0.15
endir
-0.15
ennes
-0.15
mj
-0.15
adera
-0.15
emme
-0.15
Ŀ
-0.15
zend
-0.15
POSITIVE LOGITS
counterparts
0.27
peers
0.25
counter
0.19
contempor
0.18
-counter
0.18
cohorts
0.17
challeng
0.17
peer
0.17
colleagues
0.17
fellows
0.16
Activations Density 0.067%