INDEX
    Explanations

    concepts related to gender

    terms related to gender identity and gender disparities

    New Auto-Interp
    Negative Logits
    BLIC
    -0.75
    iries
    -0.72
    Gi
    -0.71
    ernels
    -0.71
    ournal
    -0.68
     Mub
    -0.68
    Interstitial
    -0.68
     Warrant
    -0.67
    amina
    -0.67
     Gerr
    -0.67
    POSITIVE LOGITS
     dysph
    1.06
     Equality
    0.95
     pronouns
    0.93
     equality
    0.92
     stereotypes
    0.88
    bender
    0.86
     identity
    0.85
    bent
    0.84
    bending
    0.84
    fuck
    0.83
    Act Density 0.030%

    No Known Activations