INDEX
Explanations
references to women and their experiences
New Auto-Interp
Negative Logits
females
-0.19
emales
-0.18
Female
-0.18
guys
-0.17
female
-0.17
males
-0.17
/she
-0.17
adies
-0.16
Female
-0.16
male
-0.15
POSITIVE LOGITS
hood
0.35
folk
0.28
/man
0.28
-child
0.24
izer
0.23
izing
0.22
/people
0.21
iac
0.21
ifest
0.21
uele
0.21
Activations Density 0.056%