INDEX
Explanations
discussions around gender roles and the experiences of women
New Auto-Interp
Negative Logits
adder
-0.15
engin
-0.15
oze
-0.14
ibold
-0.14
issen
-0.14
Rac
-0.14
ola
-0.13
cele
-0.13
rac
-0.13
æĮĤ
-0.13
POSITIVE LOGITS
roles
0.25
homem
0.24
Roles
0.22
roles
0.21
Roles
0.21
domestic
0.20
passive
0.18
.roles
0.18
weaker
0.18
cooking
0.18
Activations Density 0.185%