INDEX
Explanations
references to family relationships and connections
New Auto-Interp
Negative Logits
Daughter
-0.24
daughter
-0.18
Wife
-0.17
granddaughter
-0.16
daughter
-0.16
Sons
-0.16
wife
-0.15
妻
-0.15
Heller
-0.15
wife
-0.15
POSITIVE LOGITS
Unc
0.61
uncle
0.60
unc
0.60
Unc
0.57
Uncle
0.56
aunt
0.52
_unc
0.52
Aunt
0.47
UNC
0.44
relative
0.43
Activations Density 0.329%