INDEX
Explanations
references to gender roles and stereotypes related to masculinity
New Auto-Interp
Negative Logits
,strlen
-0.14
igin
-0.14
]={↵-0.13
.enumer
-0.13
ÙĦس
-0.13
Türkçe
-0.13
_banner
-0.13
itech
-0.13
anvas
-0.13
Shield
-0.13
POSITIVE LOGITS
society
0.31
norms
0.27
expectations
0.27
pressure
0.26
conformity
0.26
Pressure
0.25
societal
0.24
norm
0.24
Society
0.24
expectation
0.23
Activations Density 0.173%