INDEX
    Explanations

    references to women or feminine pronouns

    mentions of the pronoun "her" in various contexts

    New Auto-Interp
    Negative Logits
    ured
    -0.61
     bluff
    -0.60
     curfew
    -0.59
    fty
    -0.58
    elta
    -0.57
    govtrack
    -0.57
    prints
    -0.55
     Cage
    -0.55
     prints
    -0.54
     sock
    -0.54
    POSITIVE LOGITS
    itage
    1.22
    tz
    1.19
    ding
    1.07
    ald
    0.97
    pes
    0.96
    itance
    0.90
    jee
    0.89
    mite
    0.88
    lich
    0.88
    mone
    0.87
    Act Density 0.046%

    No Known Activations