INDEX
Explanations
references to family relationships and personal connections
New Auto-Interp
Negative Logits
granddaughter
-0.25
grandson
-0.24
grandchildren
-0.23
Daughter
-0.22
sons
-0.19
Son
-0.18
SON
-0.18
Son
-0.18
daughter
-0.18
åŃIJ
-0.17
POSITIVE LOGITS
unc
0.17
folks
0.16
_tgt
0.15
IRT
0.15
orts
0.14
ign
0.14
acket
0.14
iente
0.14
igin
0.14
589
0.14
Activations Density 0.113%