INDEX
Explanations
pronouns 'her' and 'she'
references to a specific individual and their experiences
New Auto-Interp
Negative Logits
ype
-1.00
ypes
-0.99
ornia
-0.81
kefeller
-0.75
VERTIS
-0.71
isco
-0.71
ollo
-0.67
arios
-0.66
flex
-0.65
undo
-0.65
POSITIVE LOGITS
husband
1.72
boyfriend
1.28
metic
1.27
cule
1.23
ding
1.22
herself
1.22
daughter
1.19
vagina
1.10
maiden
1.09
granddaughter
1.08
Activations Density 0.122%