INDEX
Explanations
terms relating to gender
references to gender issues and discussions surrounding gender equality
New Auto-Interp
Negative Logits
itous
-0.72
ģĸ
-0.69
ernels
-0.69
WT
-0.67
Landing
-0.67
enium
-0.66
Grave
-0.65
heng
-0.65
BLIC
-0.63
Pub
-0.63
POSITIVE LOGITS
dysph
1.20
equality
1.11
imbalance
1.05
identity
1.04
bent
1.04
flu
1.03
bending
1.01
roles
0.99
que
0.97
pronouns
0.97
Activations Density 0.038%