INDEX
Explanations
phrases related to gender equality and women's empowerment
New Auto-Interp
Negative Logits
inos
-0.16
industry
-0.15
vanity
-0.14
Caucasian
-0.14
american
-0.14
interracial
-0.14
OTA
-0.13
/MIT
-0.13
Coc
-0.13
elines
-0.13
POSITIVE LOGITS
violence
0.27
GB
0.27
Violence
0.24
SR
0.22
GB
0.22
rights
0.21
-viol
0.21
Sexual
0.20
sexual
0.20
Viol
0.20
Activations Density 0.055%