INDEX
Explanations
references to significant women in history or prominent female figures
following commas referring to females
female royalty and historical figures
New Auto-Interp
Negative Logits
himself
-1.14
himself
-0.99
seines
-0.84
brotherhood
-0.81
łbym
-0.81
Himself
-0.81
وفاته
-0.80
his
-0.80
彼は
-0.79
boyhood
-0.79
POSITIVE LOGITS
herself
2.07
her
1.64
she
1.57
herself
1.56
그녀
1.21
její
1.18
hennes
1.18
shes
1.18
haar
1.16
彼女は
1.12
Activations Density 1.966%