INDEX
Explanations
phrases indicating frequency or typicality of actions and characteristics associated with women
New Auto-Interp
Negative Logits
possibly
-0.25
sometimes
-0.24
potentially
-0.23
occasionally
-0.21
sometimes
-0.20
Sometimes
-0.20
possibly
-0.20
иногда
-0.18
Possibly
-0.18
Sometimes
-0.17
POSITIVE LOGITS
either
0.27
either
0.22
Either
0.21
bý
0.20
unless
0.20
либо
0.19
Either
0.19
EITHER
0.19
accompanies
0.18
accompanied
0.16
Activations Density 0.487%