INDEX
Explanations
references to female characters and their roles in narratives or relationships
New Auto-Interp
Negative Logits
himself
-0.68
his
-0.52
his
-0.52
ä»ĸçļĦ
-0.41
zijn
-0.41
jeho
-0.36
seinem
-0.35
jego
-0.35
его
-0.35
妻
-0.35
POSITIVE LOGITS
herself
0.94
her
0.62
haar
0.50
jejÃŃ
0.44
她çļĦ
0.44
hers
0.42
she
0.41
ей
0.41
ее
0.40
еÑij
0.40
Activations Density 0.669%