INDEX
Explanations
terms related to gender identity
terms related to gender identity and expression
New Auto-Interp
Negative Logits
dilig
-0.71
Prices
-0.70
Money
-0.66
Hur
-0.65
Wast
-0.65
Completed
-0.64
Oper
-0.64
overlook
-0.62
SAL
-0.62
UF
-0.62
POSITIVE LOGITS
bians
0.96
genders
0.86
omas
0.85
stereotypes
0.85
lesbian
0.85
femin
0.83
lesbians
0.82
isexual
0.82
transgender
0.81
pronouns
0.81
Activations Density 0.328%