INDEX
Explanations
references to gender, particularly male and female
New Auto-Interp
Negative Logits
men
-0.94
man
-0.89
onViewCreated
-0.72
Zin
-0.69
MEN
-0.69
obatan
-0.69
WebServlet
-0.68
ly
-0.67
Notion
-0.65
Zat
-0.62
POSITIVE LOGITS
MALE
1.01
Male
0.99
MALE
0.98
Males
0.94
males
0.93
Females
0.92
emale
0.90
Males
0.90
mâle
0.89
Male
0.88
Activations Density 0.032%