INDEX
Explanations
phrases or words related to interactions with or actions towards a specific female character
references to a specific female subject
New Auto-Interp
Negative Logits
ype
-1.00
ypes
-0.93
Ĥª
-0.77
ornia
-0.77
arios
-0.76
hovah
-0.75
kefeller
-0.74
anamo
-0.74
ffic
-0.73
ormons
-0.72
POSITIVE LOGITS
husband
1.41
daughter
1.35
granddaughter
1.25
niece
1.23
boyfriend
1.19
sister
1.16
breasts
1.12
daughters
1.12
purse
1.08
grandson
1.08
Activations Density 0.098%