INDEX
Explanations
references to mothers and maternal figures
New Auto-Interp
Negative Logits
).]
-0.79
ſhall
-0.77
PreferredItem
-0.74
ashqai
-0.74
Asie
-0.73
Destin
-0.73
propOrder
-0.73
%)$
-0.72
}`}>
-0.72
Hermit
-0.71
POSITIVE LOGITS
mothers
0.86
Mothers
0.78
mother
0.75
Mothers
0.75
MOTHER
0.75
Octo
0.74
MOTHER
0.69
scor
0.68
ofa
0.67
Mother
0.67
Activations Density 0.015%