INDEX
Explanations
references to male-related concepts or gender-specific terms
mentions of "male" in various contexts related to gender dynamics
New Auto-Interp
Negative Logits
IVERS
-0.70
uilt
-0.70
GOODMAN
-0.69
ovie
-0.69
Deal
-0.68
krit
-0.67
ravel
-0.66
Manufacturer
-0.66
rep
-0.64
overed
-0.64
POSITIVE LOGITS
volent
1.99
vol
1.10
genital
1.03
ejac
0.92
infertility
0.90
genitals
0.86
supremacists
0.84
vich
0.83
anatomy
0.79
supremacist
0.79
Activations Density 0.025%