INDEX
    Explanations

    references to societal roles and stereotypes

    New Auto-Interp
    Negative Logits
    μμ
    -0.16
    oller
    -0.16
    .echo
    -0.15
     GURL
    -0.15
    rana
    -0.14
    ÑĢава
    -0.14
    zw
    -0.14
    itung
    -0.14
    hlen
    -0.14
    cul
    -0.14
    POSITIVE LOGITS
     stereotype
    0.28
     stereotypes
    0.25
     stere
    0.25
     Ster
    0.23
     ster
    0.18
    pec
    0.17
    asso
    0.16
     stereo
    0.15
     pigeon
    0.15
     vil
    0.15
    Act Density 0.028%

    No Known Activations