INDEX
Explanations
relationships and interactions within familial and romantic contexts
New Auto-Interp
Negative Logits
Frau
-0.30
wife
-0.30
wives
-0.29
Ms
-0.28
vrouw
-0.28
wife
-0.28
woman
-0.27
Wife
-0.27
妻
-0.27
lady
-0.26
POSITIVE LOGITS
husband
0.52
husbands
0.51
Husband
0.48
boyfriend
0.46
ä¸Ī夫
0.44
male
0.41
males
0.39
мÑĥж
0.39
hubby
0.39
hus
0.38
Activations Density 0.612%