INDEX
Explanations
references to male identities and roles in various contexts
New Auto-Interp
Negative Logits
herself
-1.02
bint
-0.84
herself
-0.72
حياتها
-0.68
Kaur
-0.65
protoimpl
-0.65
脚注の使い方
-0.64
Arund
-0.61
могла
-0.58
цезда
-0.57
POSITIVE LOGITS
himself
1.26
himself
1.09
masculinity
0.98
manhood
0.97
Himself
0.87
Males
0.83
مردانه
0.82
masculino
0.82
Male
0.81
męski
0.81
Activations Density 1.158%