INDEX
Explanations
references to societal expectations and experiences related to motherhood, gender roles, and individual choices
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.18
3:0.25
4:0.11
5:0.03
6:0.03
7:0.07
8:0.04
9:0.04
10:0.09
11:0.07
Negative Logits
FIRE
-1.58
tumblr
-1.56
Clockwork
-1.46
CALL
-1.38
Candy
-1.37
Cobra
-1.36
DRAGON
-1.35
Luther
-1.34
Seymour
-1.32
IRC
-1.31
POSITIVE LOGITS
anymore
2.85
nor
2.62
nor
1.68
necessarily
1.67
Enough
1.53
slightest
1.53
��
1.50
imilar
1.50
Orig
1.50
confir
1.49
Activations Density 0.021%