INDEX
    Explanations

    gender stereotypes

    references to stereotypes and their impacts

    New Auto-Interp
    Negative Logits
    hner
    -0.75
    inth
    -0.74
    ayan
    -0.73
    mits
    -0.71
    ositories
    -0.69
    kos
    -0.68
    metic
    -0.68
    endez
    -0.67
    gan
    -0.66
    mission
    -0.65
    POSITIVE LOGITS
     stereotype
    1.22
     stereotypes
    1.16
     stereotyp
    1.15
    rities
    0.94
     caricature
    0.84
     prejudice
    0.83
     stereotypical
    0.82
     clich
    0.81
     caric
    0.79
    è¦ļéĨĴ
    0.76
    Act Density 0.013%

    No Known Activations