INDEX
Explanations
references to a specific female individual in different contexts or situations
references to a specific individual, particularly in relation to events or actions involving that person
New Auto-Interp
Negative Logits
ype
-0.88
ypes
-0.85
ozy
-0.82
kefeller
-0.80
arios
-0.80
eers
-0.79
ornia
-0.78
ablishment
-0.76
eering
-0.74
uminati
-0.69
POSITIVE LOGITS
own
1.32
ding
1.20
daughter
1.10
metic
1.08
cule
1.05
granddaughter
1.04
Majesty
1.04
husband
1.03
itage
0.99
sister
0.99
Activations Density 0.108%