INDEX
    Explanations

    female pronouns

    mentions of a female subject in various contexts

    New Auto-Interp
    Negative Logits
     Jimmy
    -0.62
    fit
    -0.61
    ugu
    -0.61
    ensable
    -0.60
    Outside
    -0.60
     Jindal
    -0.58
     Vers
    -0.58
    full
    -0.57
    rax
    -0.56
    reprene
    -0.56
    POSITIVE LOGITS
    pher
    1.23
    pherd
    1.08
    pard
    1.01
    athed
    0.97
    ffield
    0.96
    'll
    0.89
    athing
    0.87
    'd
    0.87
    metic
    0.86
    ppard
    0.83
    Act Density 0.070%

    No Known Activations