INDEX
Explanations
phrases related to personal interactions with a female individual
New Auto-Interp
Negative Logits
ypes
-1.04
ype
-0.98
kefeller
-0.92
ornia
-0.87
arios
-0.82
VERTIS
-0.76
ollo
-0.74
ormons
-0.72
ozy
-0.70
eers
-0.70
POSITIVE LOGITS
ding
1.29
husband
1.23
own
1.20
metic
1.17
cule
1.16
daughter
1.16
nia
1.07
granddaughter
1.05
Majesty
1.04
ded
1.04
Activations Density 0.113%