INDEX
Explanations
references to males or pronouns specifically associated with men
New Auto-Interp
Negative Logits
etheless
-0.83
intrusion
-0.71
DAY
-0.61
Solitaire
-0.60
reinforcement
-0.59
withdrawal
-0.59
BALL
-0.58
cred
-0.56
INGTON
-0.56
gems
-0.55
POSITIVE LOGITS
gemony
1.19
resy
1.18
arer
1.15
lder
1.13
isen
1.05
arers
1.05
ALTH
1.03
uristic
1.03
cht
1.01
idel
1.01
Activations Density 0.031%