INDEX
Explanations
proper names
mentions of specific female names
New Auto-Interp
Negative Logits
shit
-0.79
cffff
-0.75
asper
-0.72
ozy
-0.71
pelled
-0.71
pec
-0.69
yg
-0.69
doors
-0.69
sed
-0.69
*/(
-0.68
POSITIVE LOGITS
Louise
1.12
Marie
1.08
Thatcher
1.06
Anne
1.06
DeVos
1.05
Mae
1.03
Jane
1.02
Marie
1.01
herself
1.00
Margaret
0.98
Activations Density 0.064%