INDEX
    Explanations

    keywords related to gender identity and gender-related topics

    references to gender in various contexts

    New Auto-Interp
    Negative Logits
     Gerr
    -0.73
     Lent
    -0.71
    Gi
    -0.70
    ernels
    -0.67
     Prosecut
    -0.66
    WT
    -0.66
     Mub
    -0.64
    BLIC
    -0.64
     Grave
    -0.64
    amina
    -0.64
    POSITIVE LOGITS
     dysph
    1.17
     equality
    1.00
     pronouns
    0.96
     Equality
    0.96
     identity
    0.90
     stereotypes
    0.89
     imbalance
    0.85
    flu
    0.85
    bending
    0.85
    bender
    0.83
    Act Density 0.028%

    No Known Activations