INDEX
Explanations
references to women or females in various contexts, such as combat roles, management positions, cosmetic surgery, and political preferences
New Auto-Interp
Negative Logits
REDACTED
-0.89
UFF
-0.81
-+-+
-0.76
RAY
-0.75
ebus
-0.74
REC
-0.72
EMA
-0.71
æĸ¹
-0.71
rador
-0.70
eme
-0.70
POSITIVE LOGITS
folk
1.24
empowerment
1.03
genital
0.99
opausal
0.94
breasts
0.93
menstru
0.92
hood
0.87
contraceptive
0.87
reproductive
0.86
volent
0.83
Activations Density 0.071%