INDEX
Explanations
phrases related to patriarchal values and gender roles
New Auto-Interp
Negative Logits
racial
-0.17
racial
-0.17
Stam
-0.14
Ethnic
-0.14
volatile
-0.14
jac
-0.14
homosexual
-0.14
ethnic
-0.14
jay
-0.14
682
-0.14
POSITIVE LOGITS
girls
0.17
society
0.16
adox
0.15
Girls
0.14
lev
0.14
λια
0.14
boys
0.14
Trophy
0.14
boys
0.14
.Dom
0.14
Activations Density 0.059%