INDEX
Explanations
references to racial dynamics and stereotypes in relationships
New Auto-Interp
Negative Logits
Male
-0.78
Gender
-0.75
Male
-0.75
Gender
-0.66
Daughter
-0.63
MALE
-0.63
Boyfriend
-0.62
male
-0.62
gender
-0.59
Husband
-0.58
POSITIVE LOGITS
women
1.77
women
1.13
vrouwen
0.98
WOMEN
0.96
mujeres
0.94
ladies
0.89
woman
0.86
женщин
0.86
Women
0.84
kvinder
0.84
Activations Density 0.320%