INDEX
Explanations
pronouns associated with male and female characters
New Auto-Interp
Negative Logits
Monfieur
-0.81
Theſe
-0.80
Conſ
-0.79
Perſ
-0.78
faſt
-0.77
pleaſure
-0.77
Reſ
-0.77
ſeveral
-0.74
Efq
-0.74
Eſ
-0.74
POSITIVE LOGITS
he
2.51
He
1.96
she
1.95
He
1.83
his
1.64
they
1.61
him
1.50
She
1.47
She
1.44
she
1.40
Activations Density 0.156%