INDEX
Explanations
references to gender, specifically men, in relation to societal norms and stereotypes
references to men and their comparative roles or attributes in society
New Auto-Interp
Negative Logits
ITS
-0.81
Assembly
-0.80
UGE
-0.78
REC
-0.75
UFF
-0.75
Ward
-0.73
IVERS
-0.70
OWN
-0.70
REDACTED
-0.67
Burn
-0.67
POSITIVE LOGITS
volent
1.08
opausal
1.03
ejac
0.94
genitals
0.93
folk
0.86
ager
0.79
icide
0.79
friendships
0.77
sexually
0.72
pronouns
0.72
Activations Density 0.110%