INDEX
Explanations
references to gender, specifically focusing on male and female entities
New Auto-Interp
Negative Logits
antaranya
-0.52
AnchorStyles
-0.50
abbraccio
-0.47
styleable
-0.46
WebServlet
-0.45
verifyException
-0.45
SBATCH
-0.45
irvana
-0.45
geddon
-0.44
inning
-0.44
POSITIVE LOGITS
male
3.63
Male
3.23
Male
3.22
male
3.08
MALE
2.91
MALE
2.66
female
2.55
Female
2.44
Female
2.42
males
2.41
Activations Density 0.237%