INDEX
Explanations
references to male individuals or male-related terms
New Auto-Interp
Negative Logits
ängerin
-0.44
Administrativna
-0.40
İstinadlar
-0.37
pani
-0.35
hipping
-0.33
pac
-0.32
CONDITIONS
-0.31
aguya
-0.31
omyces
-0.31
Foer
-0.31
POSITIVE LOGITS
masculine
1.05
Mascul
0.98
masculin
0.97
male
0.97
masculino
0.95
manly
0.94
Male
0.93
masculinity
0.93
masculina
0.93
males
0.93
Activations Density 0.111%