INDEX
    Explanations

    phrases related to self-identification or attribution of identity

    phrases indicating self-identification, especially in relation to gender and identity

    New Auto-Interp
    Negative Logits
    ersen
    -0.85
    ysc
    -0.84
    terness
    -0.71
    erest
    -0.70
    TPS
    -0.69
    Plex
    -0.69
    TL
    -0.68
    ttp
    -0.67
    ipl
    -0.67
    Side
    -0.66
    POSITIVE LOGITS
     belonging
    0.89
    pires
    0.83
    pired
    0.77
     follows
    0.69
    pers
    0.68
     Die
    0.68
    ©¶æ
    0.68
     Commando
    0.67
     Burk
    0.65
     Nig
    0.64
    Act Density 0.065%

    No Known Activations