INDEX
    Explanations

    phrases related to gender stereotypes, protection, and care

    phrases that express traditional gender biases and stereotypes about women's roles

    New Auto-Interp
    Negative Logits
    ":[{"
    -0.54
    ERG
    -0.50
     Canaver
    -0.49
    odcast
    -0.47
     Patreon
    -0.46
     puzzling
    -0.45
    BILITIES
    -0.43
    ometimes
    -0.42
    DragonMagazine
    -0.40
    Package
    -0.40
    POSITIVE LOGITS
    )).
    1.09
    ]).
    1.00
    )."
    0.96
    %).
    0.94
    ).[
    0.90
    .).
    0.89
    ?).
    0.87
    ).
    0.87
    ").
    0.86
    ').
    0.83
    Act Density 2.483%

    No Known Activations