INDEX
Explanations
pronouns referring to females
references to a specific female subject
New Auto-Interp
Negative Logits
kefeller
-0.90
odder
-0.71
church
-0.69
ornia
-0.67
perties
-0.67
atory
-0.65
Jindal
-0.65
folios
-0.63
Vers
-0.62
ENSE
-0.62
POSITIVE LOGITS
pher
1.35
athed
1.21
'd
1.12
athing
1.11
pherd
1.07
ffield
1.04
pard
1.01
'll
1.00
ppard
0.96
ldon
0.85
Activations Density 0.103%