INDEX
Explanations
adjectives related to gender characteristics
references to gender and the concepts of masculinity and femininity
New Auto-Interp
Negative Logits
Assembly
-0.77
oulos
-0.74
akings
-0.74
Deal
-0.72
govtrack
-0.71
rella
-0.70
Sound
-0.67
rocket
-0.67
Chip
-0.67
eering
-0.66
POSITIVE LOGITS
masculine
0.98
feminine
0.95
inity
0.91
femin
0.91
mascul
0.88
inant
0.82
istries
0.82
pronouns
0.81
atan
0.80
hygiene
0.79
Activations Density 0.018%