INDEX
Explanations
possessive pronouns and related expressions indicating familial or personal relationships
New Auto-Interp
Negative Logits
grandson
-0.36
granddaughter
-0.33
grandchildren
-0.32
Daughter
-0.29
Son
-0.28
Sons
-0.28
sons
-0.27
Son
-0.27
husbands
-0.25
sons
-0.24
POSITIVE LOGITS
mother
0.28
parents
0.28
folks
0.25
father
0.25
step
0.24
older
0.23
mother
0.20
sister
0.19
farther
0.19
birth
0.18
Activations Density 0.165%