INDEX
Explanations
discussions about gender roles and societal expectations regarding women's behavior
New Auto-Interp
Negative Logits
меж
-0.15
iversity
-0.14
oze
-0.14
grandson
-0.13
Rac
-0.13
PWD
-0.13
زادÙĩ
-0.13
ocale
-0.13
ibold
-0.13
cele
-0.13
POSITIVE LOGITS
homem
0.26
roles
0.25
Roles
0.22
roles
0.21
Roles
0.20
domestic
0.19
men
0.19
passive
0.18
Eve
0.17
Domestic
0.17
Activations Density 0.185%