INDEX
    Explanations

    mentions of gender-related topics

    references to gender-related topics and issues

    New Auto-Interp
    Negative Logits
    BLIC
    -0.77
    amina
    -0.71
    ernels
    -0.70
     Warrant
    -0.69
     Showtime
    -0.67
    Gi
    -0.66
    iries
    -0.66
     Mub
    -0.65
    steen
    -0.65
    Interstitial
    -0.65
    POSITIVE LOGITS
     dysph
    1.17
     pronouns
    0.95
     equality
    0.90
     Equality
    0.88
    bender
    0.87
     identity
    0.87
    fuck
    0.87
     stereotypes
    0.86
     imbalance
    0.85
    endered
    0.84
    Act Density 0.022%

    No Known Activations