INDEX
    Explanations

    references to individuals' identities and how they are perceived by others

    New Auto-Interp
    Negative Logits
    rary
    -0.16
    anas
    -0.16
     wording
    -0.15
     motive
    -0.14
    ande
    -0.14
    opy
    -0.14
    illon
    -0.14
    åħ¸
    -0.13
    enu
    -0.13
     Zy
    -0.13
    POSITIVE LOGITS
     nick
    0.24
    nick
    0.23
    nickname
    0.21
     Nick
    0.20
     nickname
    0.19
    .nickname
    0.18
     shortened
    0.18
     shorter
    0.17
     rever
    0.17
     informal
    0.17
    Act Density 0.078%

    No Known Activations