INDEX
Explanations
first names
names of female characters or individuals mentioned in the context
New Auto-Interp
Negative Logits
sed
-1.09
lasses
-0.97
lot
-0.96
pan
-0.92
dn
-0.92
yg
-0.90
er
-0.88
hire
-0.88
erate
-0.86
err
-0.86
POSITIVE LOGITS
Mae
1.04
Devi
1.00
Marie
1.00
herself
0.96
Grande
0.85
Marie
0.80
Louise
0.77
Christina
0.77
Fey
0.77
Herrera
0.77
Activations Density 0.145%