INDEX
Explanations
female pronouns
mentions of a female subject in various contexts
New Auto-Interp
Negative Logits
Jimmy
-0.62
fit
-0.61
ugu
-0.61
ensable
-0.60
Outside
-0.60
Jindal
-0.58
Vers
-0.58
full
-0.57
rax
-0.56
reprene
-0.56
POSITIVE LOGITS
pher
1.23
pherd
1.08
pard
1.01
athed
0.97
ffield
0.96
'll
0.89
athing
0.87
'd
0.87
metic
0.86
ppard
0.83
Activations Density 0.070%