INDEX
Explanations
phrases related to gender stereotypes, protection, and care
phrases that express traditional gender biases and stereotypes about women's roles
New Auto-Interp
Negative Logits
":[{"-0.54
ERG
-0.50
Canaver
-0.49
odcast
-0.47
Patreon
-0.46
puzzling
-0.45
BILITIES
-0.43
ometimes
-0.42
DragonMagazine
-0.40
Package
-0.40
POSITIVE LOGITS
)).
1.09
]).
1.00
)."
0.96
%).
0.94
).[
0.90
.).
0.89
?).
0.87
).
0.87
").
0.86
').
0.83
Activations Density 2.483%