INDEX
    Explanations

    mentions of individuals by their gender-neutral pronouns

    references to individuals and their roles or achievements

    New Auto-Interp
    Negative Logits
    Untitled
    -0.73
     bothering
    -0.71
    Battery
    -0.63
    Enlarge
    -0.62
    mma
    -0.61
    âĺħâĺħ
    -0.60
    âĢ¢âĢ¢
    -0.60
     pregn
    -0.60
    Leaks
    -0.60
     cliché
    -0.60
    POSITIVE LOGITS
     also
    0.94
    pherd
    0.91
    theless
    0.88
    miah
    0.88
     consists
    0.86
     consisted
    0.84
     comprises
    0.83
     graduated
    0.81
    ffield
    0.80
    'll
    0.79
    Act Density 0.305%

    No Known Activations