INDEX
    Explanations

    words related to gender

    mentions of gender and related topics

    New Auto-Interp
    Negative Logits
     GOODMAN
    -0.73
    amina
    -0.70
    BLIC
    -0.70
     Provided
    -0.68
     Warrant
    -0.68
    steen
    -0.67
     Memor
    -0.67
    iries
    -0.66
    etsk
    -0.66
    ernels
    -0.65
    POSITIVE LOGITS
     dysph
    1.02
    endered
    0.96
     genders
    0.93
     gender
    0.89
     equality
    0.89
    fuck
    0.87
    bender
    0.85
     pronouns
    0.84
     imbalance
    0.84
     stereotypes
    0.83
    Act Density 0.016%

    No Known Activations