INDEX
Explanations
mentions of gender, specifically focusing on males
mentions of the word "male."
New Auto-Interp
Negative Logits
GOODMAN
-0.80
krit
-0.77
heet
-0.75
etsk
-0.73
leans
-0.72
arella
-0.69
rep
-0.69
duration
-0.68
xtap
-0.66
today
-0.66
POSITIVE LOGITS
volent
1.84
genital
1.01
vol
0.97
ejac
0.93
genitals
0.88
infertility
0.85
supremacy
0.81
circumcision
0.79
supremacist
0.79
reproductive
0.78
Activations Density 0.018%