INDEX
Explanations
mentions of a particular female individual
repeated references to the pronoun "her."
New Auto-Interp
Negative Logits
Process
-0.57
rahim
-0.56
ivo
-0.55
gaard
-0.54
DX
-0.54
otation
-0.54
rencies
-0.54
lite
-0.54
çīĪ
-0.53
agos
-0.53
POSITIVE LOGITS
her
3.09
hers
2.57
herself
2.50
Her
2.14
she
2.02
She
1.94
HER
1.90
Her
1.81
she
1.76
She
1.74
Activations Density 0.097%