INDEX
    Explanations

    references to violence and events related to the LGBT community

    New Auto-Interp
    Negative Logits
    ysz
    -0.18
     Malay
    -0.17
     Malaysian
    -0.15
     Malaysia
    -0.15
     коÑĢол
    -0.14
     Boone
    -0.14
    ź
    -0.14
    lew
    -0.14
     Indonesian
    -0.14
     Mohammed
    -0.13
    POSITIVE LOGITS
     Georgia
    0.40
     Georgian
    0.39
    Georgia
    0.36
     гÑĢÑĥз
    0.29
    áĥ
    0.27
     Kak
    0.27
     Bat
    0.25
     Georg
    0.24
     Caucas
    0.24
     Caucasian
    0.24
    Act Density 0.013%

    No Known Activations