INDEX
    Explanations

    mentions of LGBTQ organizations or related terms

    references to the queer community and related terminology

    New Auto-Interp
    Negative Logits
    ãĥŁ
    -0.74
     GOODMAN
    -0.71
    ãĥ¼ãĥĨãĤ£
    -0.69
    é¾įå
    -0.68
     batting
    -0.66
    ERSON
    -0.65
    ãĤ¼ãĤ¦ãĤ¹
    -0.65
    uania
    -0.65
     eleph
    -0.65
     McDonnell
    -0.64
    POSITIVE LOGITS
    zon
    1.14
     Que
    1.00
    erness
    0.99
    que
    0.98
    Que
    0.97
    ues
    0.94
    bec
    0.90
    eg
    0.88
    edo
    0.85
    ue
    0.84
    Act Density 0.007%

    No Known Activations