INDEX
Explanations
adjectives associated with gender characteristics, particularly masculine and feminine
references to gender roles, specifically masculine and feminine attributes
New Auto-Interp
Negative Logits
Assembly
-0.86
Deal
-0.77
Reviewer
-0.75
Tour
-0.73
EV
-0.72
oulos
-0.69
eding
-0.68
Reviewed
-0.68
EVA
-0.67
Publisher
-0.67
POSITIVE LOGITS
masculinity
1.15
masculine
1.14
feminine
1.00
mascul
0.99
femin
0.87
istries
0.86
citiz
0.83
femin
0.82
WithNo
0.80
submar
0.80
Activations Density 0.015%