INDEX
Explanations
references to mothers and mother-child relationships
New Auto-Interp
Negative Logits
Danilo
-0.77
Nye
-0.68
GGLE
-0.65
чесно
-0.65
udia
-0.63
over
-0.63
ség
-0.63
Hendricks
-0.62
Darius
-0.62
skosten
-0.61
POSITIVE LOGITS
mother
1.84
mothers
1.82
Mother
1.80
MOTHER
1.75
MOTHER
1.71
mother
1.70
Mothers
1.70
Mothers
1.70
Mother
1.68
mothers
1.43
Activations Density 0.050%