INDEX
Explanations
references to gender, particularly the presence of men and women in various contexts
New Auto-Interp
Negative Logits
нÑĮ
-0.15
utan
-0.15
ollipop
-0.15
VIC
-0.15
oron
-0.15
lesi
-0.14
Male
-0.14
дÑĢом
-0.14
icina
-0.14
ritis
-0.14
POSITIVE LOGITS
women
0.59
woman
0.54
women
0.49
Women
0.43
Women
0.42
WOM
0.42
女人
0.41
woman
0.41
Woman
0.40
mujeres
0.38
Activations Density 0.036%