INDEX
    Explanations

    words related to gender

    references to gender and gender identity

    New Auto-Interp
    Negative Logits
    ernels
    -0.75
    BLIC
    -0.73
     Grave
    -0.71
     Gerr
    -0.68
    esm
    -0.67
    edia
    -0.67
    enium
    -0.67
    ģĸ
    -0.67
    akings
    -0.66
    lege
    -0.65
    POSITIVE LOGITS
     dysph
    1.29
     equality
    1.16
     imbalance
    1.10
     pronouns
    1.09
     identity
    1.05
     stereotypes
    1.04
     Equality
    1.01
     stereotyp
    0.99
    flu
    0.98
     symmetry
    0.98
    Act Density 0.030%

    No Known Activations