INDEX
    Explanations

    pronouns referring to a male person

    repeated references to a specific individual

    New Auto-Interp
    Negative Logits
    earch
    -0.78
     Peak
    -0.68
    higher
    -0.66
    awar
    -0.64
    rame
    -0.64
    veyard
    -0.63
    peak
    -0.62
    tones
    -0.62
    aura
    -0.62
    reshold
    -0.62
    POSITIVE LOGITS
    'll
    1.22
    'd
    1.20
    zbollah
    1.06
     wrote
    1.02
     tweeted
    1.02
    've
    0.90
    resy
    0.89
     joked
    0.89
     wondered
    0.88
     penned
    0.88
    Act Density 0.259%

    No Known Activations