INDEX
Explanations
proper nouns and significant elements associated with pop culture and notable events
New Auto-Interp
Negative Logits
wife
-0.31
妻
-0.31
wives
-0.28
Wife
-0.25
wife
-0.23
prostate
-0.22
beard
-0.20
-wife
-0.19
girlfriend
-0.19
åħĦå¼Ł
-0.19
POSITIVE LOGITS
Husband
0.25
husbands
0.24
husband
0.23
herself
0.22
ä¸Ī夫
0.22
her
0.21
chá»ĵng
0.20
lesb
0.20
vagina
0.18
feminism
0.18
Activations Density 0.169%