INDEX
Explanations
references to reproductive health and pregnancy-related conditions
New Auto-Interp
Negative Logits
businesswoman
-0.92
herself
-0.92
herself
-0.88
actress
-0.87
actresses
-0.84
goddess
-0.81
Baroness
-0.80
Kaur
-0.79
Autorin
-0.76
heroine
-0.76
POSITIVE LOGITS
pria
1.06
masculinity
1.06
manhood
1.00
male
1.00
gentlemen
0.99
gentleman
0.99
manly
0.99
mężczy
0.96
Gentleman
0.96
father
0.96
Activations Density 0.607%