INDEX
Explanations
references to gender or gender-related terms, particularly focusing on the term "male"
mentions of male and female categories
New Auto-Interp
Negative Logits
rieg
-0.79
bley
-0.79
leans
-0.78
heet
-0.77
lay
-0.75
ovie
-0.75
hemy
-0.74
ingen
-0.73
weet
-0.73
Deal
-0.72
POSITIVE LOGITS
volent
1.75
genital
1.28
vol
0.99
anatomy
0.91
genitals
0.89
infertility
0.87
counterparts
0.87
circumcision
0.86
reproductive
0.85
gaze
0.85
Activations Density 0.068%