INDEX
Explanations
references to possession and relationships in familial or parental contexts
New Auto-Interp
Negative Logits
sisters
-0.26
siblings
-0.23
grandfather
-0.21
cousins
-0.21
granddaughter
-0.21
Cousins
-0.20
Sisters
-0.20
brothers
-0.19
sister
-0.19
Brothers
-0.19
POSITIVE LOGITS
child
0.17
beloved
0.16
athlete
0.15
child
0.15
Child
0.15
Child
0.15
æķı
0.14
ÎŃÏģ
0.14
darling
0.14
kid
0.14
Activations Density 0.075%