INDEX
    Explanations

    references to queer identity and related terms

    New Auto-Interp
    Negative Logits
    nya
    -0.18
    shape
    -0.18
    ship
    -0.16
    orne
    -0.16
    son
    -0.16
    sw
    -0.16
    ly
    -0.16
    sun
    -0.16
    s
    -0.15
    li
    -0.15
    POSITIVE LOGITS
    uing
    0.29
    ued
    0.28
    bec
    0.28
    ues
    0.28
    ens
    0.21
    erness
    0.21
    uetype
    0.20
    UES
    0.19
    estion
    0.19
    ENS
    0.17
    Act Density 0.008%

    No Known Activations