INDEX
Explanations
terms and phrases associated with men and masculine identity
New Auto-Interp
Negative Logits
']")
-1.17
')")
-1.15
.}}
-0.98
$")
-0.95
."]
-0.94
^(@)
-0.94
]]
-0.94
"]]
-0.93
"/")
-0.92
%")
-0.92
POSITIVE LOGITS
men
1.63
Men
1.57
MEN
1.36
Men
1.33
men
1.08
man
1.06
MEN
1.01
Man
0.97
hommes
0.95
uomini
0.93
Activations Density 0.050%