INDEX
Explanations
references to male and female gender distinctions in various contexts
New Auto-Interp
Negative Logits
Notion
-0.78
Things
-0.75
ed
-0.73
things
-0.71
man
-0.70
ap
-0.69
uğ
-0.68
عج
-0.66
THINGS
-0.66
Tutto
-0.65
POSITIVE LOGITS
Male
1.39
MALE
1.37
MALE
1.31
male
1.27
males
1.20
Male
1.18
Males
1.16
FEMALE
1.15
Females
1.11
Males
1.08
Activations Density 0.074%