INDEX
Explanations
references to male individuals and their associated behaviors or characteristics
New Auto-Interp
Negative Logits
ängerin
-0.40
xinh
-0.39
putri
-0.38
ExecuteReader
-0.35
สวย
-0.35
quedado
-0.35
Actress
-0.35
presidenta
-0.35
करती
-0.35
Autorin
-0.35
POSITIVE LOGITS
męski
1.10
masculinity
1.10
manly
1.06
manhood
1.05
mascul
1.05
masculina
1.04
męskie
0.99
Mascul
0.99
masculinos
0.98
masculine
0.97
Activations Density 1.013%