INDEX
Explanations
references to social relationships and familial connections
New Auto-Interp
Negative Logits
妻
-0.77
Ms
-0.77
妻子
-0.76
istrinya
-0.75
Ms
-0.70
老婆
-0.70
彼女
-0.68
koji
-0.66
女友
-0.65
wife
-0.64
POSITIVE LOGITS
herself
1.36
husband
1.11
husbands
1.02
suaminya
0.99
herself
0.98
husband
0.98
boyfriends
0.94
chồng
0.90
Husband
0.88
motherhood
0.87
Activations Density 0.493%