INDEX
Explanations
references to women, particularly those of diverse backgrounds and their empowerment
New Auto-Interp
Negative Logits
OrUpdate
-0.16
female
-0.16
Female
-0.15
guy
-0.15
male
-0.15
emales
-0.15
urances
-0.15
females
-0.15
males
-0.14
(es
-0.14
POSITIVE LOGITS
folk
0.36
who
0.26
/man
0.25
hood
0.23
opause
0.21
zimmer
0.21
/g
0.21
/m
0.20
folk
0.20
empowerment
0.20
Activations Density 0.051%