INDEX
    Explanations

    pronouns related to gendered third-person references

    New Auto-Interp
    Negative Logits
    umpt
    -0.15
    lez
    -0.15
    yers
    -0.14
    Äģn
    -0.14
    umper
    -0.14
     cosine
    -0.14
    edral
    -0.14
    agues
    -0.14
    ask
    -0.13
     dela
    -0.13
    POSITIVE LOGITS
    isman
    0.17
    òi
    0.16
    stÅĻÃŃ
    0.16
    HEEL
    0.15
    ston
    0.14
    æĺ¯æĪij
    0.14
    dog
    0.14
    ron
    0.14
    PackageName
    0.14
     itemprop
    0.14
    Act Density 0.706%

    No Known Activations