INDEX
Explanations
references to familial relationships and connections
New Auto-Interp
Negative Logits
boyfriend
-0.22
aida
-0.17
eck
-0.17
oser
-0.16
emek
-0.16
Cousins
-0.15
idon
-0.15
Mothers
-0.15
Ñıб
-0.15
Fathers
-0.15
POSITIVE LOGITS
wife
0.60
Wife
0.48
wife
0.44
妻
0.42
-wife
0.40
wives
0.34
esposa
0.31
vợ
0.30
Mrs
0.28
spouse
0.27
Activations Density 0.120%