INDEX
    Explanations

    phrases related to social issues, especially outdated social norms and gender biases

    references to societal norms and the effects of gender bias

    New Auto-Interp
    Negative Logits
    endment
    -0.87
    etsk
    -0.84
    ospons
    -0.83
    hire
    -0.81
    amen
    -0.79
    ppa
    -0.76
    osponsors
    -0.73
    hots
    -0.72
     Schn
    -0.72
    earch
    -0.71
    POSITIVE LOGITS
     stereotypes
    1.74
     prejudices
    1.67
     notions
    1.66
     precon
    1.65
     assumptions
    1.61
     norms
    1.49
     stereotype
    1.47
     beliefs
    1.45
     biases
    1.40
     myths
    1.39
    Act Density 0.408%

    No Known Activations