INDEX
    Explanations

    pronouns indicating gender

    references to gender pronouns, particularly focusing on "he" and "she."

    New Auto-Interp
    Negative Logits
     Mub
    -0.70
    Joy
    -0.70
     Vil
    -0.69
     Vive
    -0.69
     Bacon
    -0.65
     Delicious
    -0.63
     CLR
    -0.61
     Rusty
    -0.61
     Yar
    -0.61
     Downs
    -0.60
    POSITIVE LOGITS
    self
    0.96
     own
    0.80
    selves
    0.77
    itage
    0.73
    issance
    0.72
    agon
    0.70
     spouse
    0.69
    acht
    0.66
    ynasty
    0.66
    gdala
    0.66
    Act Density 0.034%

    No Known Activations