INDEX
Explanations
references to gender and women's issues
New Auto-Interp
Negative Logits
Woman
-0.26
woman
-0.25
Woman
-0.25
woman
-0.24
женÑīина
-0.22
Womens
-0.22
Women
-0.20
mulher
-0.19
vrouw
-0.18
女人
-0.18
POSITIVE LOGITS
men
0.34
-men
0.25
men
0.25
children
0.25
Men
0.23
gentlemen
0.23
Men
0.21
children
0.21
Children
0.20
hommes
0.20
Activations Density 0.029%