INDEX
Explanations
references to family dynamics and parental roles
New Auto-Interp
Negative Logits
(completion
-0.17
buddy
-0.16
edia
-0.15
äºĮ人
-0.15
ollipop
-0.15
μη
-0.15
-valu
-0.15
dikke
-0.14
gii
-0.14
)throws
-0.14
POSITIVE LOGITS
sons
0.40
children
0.37
daughters
0.36
sons
0.29
children
0.29
boys
0.29
Sons
0.28
Children
0.28
biological
0.27
Children
0.25
Activations Density 0.059%