INDEX
    Explanations

    references to gender-related topics in various contexts

    references to gender and gender-related topics

    New Auto-Interp
    Negative Logits
    ernels
    -0.77
    BLIC
    -0.73
    WT
    -0.72
     Lent
    -0.68
     Gerr
    -0.68
    Gi
    -0.67
     Warrant
    -0.67
    edia
    -0.65
    iths
    -0.65
     Grave
    -0.65
    POSITIVE LOGITS
     dysph
    1.22
     equality
    1.07
     imbalance
    0.98
     pronouns
    0.97
     identity
    0.96
     Equality
    0.96
     stereotypes
    0.94
    flu
    0.91
     discrimination
    0.91
     inequality
    0.90
    Act Density 0.046%

    No Known Activations