INDEX
Explanations
words related to a specific female character named "her" or pronouns indicating her
references to a specific individual, suggesting focus on personal narratives or achievements
New Auto-Interp
Negative Logits
ypes
-0.99
ype
-0.99
ornia
-0.82
kefeller
-0.78
arios
-0.75
isco
-0.69
anamo
-0.69
ollo
-0.69
hovah
-0.68
undo
-0.67
POSITIVE LOGITS
husband
1.50
ding
1.32
cule
1.29
metic
1.25
herself
1.16
boyfriend
1.10
ded
1.09
nia
1.06
daughter
1.06
Majesty
1.04
Activations Density 0.136%