INDEX
Explanations
phrases related to a specific female individual
references to a specific female character and her experiences
New Auto-Interp
Negative Logits
arios
-0.83
ablishment
-0.82
ozy
-0.81
ypes
-0.79
ornia
-0.77
±
-0.77
ype
-0.76
:/
-0.72
uck
-0.70
oft
-0.69
POSITIVE LOGITS
own
1.21
granddaughter
1.10
ding
1.09
daughter
1.08
metic
1.04
itage
1.00
niece
1.00
daughters
0.97
grandchildren
0.96
assailant
0.96
Activations Density 0.100%