INDEX
Explanations
references to male characters or entities
New Auto-Interp
Negative Logits
comod
-0.69
auc
-0.67
aspec
-0.65
Pria
-0.65
Źródło
-0.65
Atas
-0.65
mín
-0.64
aig
-0.63
Monfieur
-0.63
Inscrivez
-0.62
POSITIVE LOGITS
he
2.00
He
1.75
He
1.69
she
1.65
himself
1.45
she
1.38
She
1.37
his
1.33
himself
1.30
She
1.29
Activations Density 0.249%