INDEX
    Explanations

    phrases related to discrimination and negative attitudes towards particular groups

    references to homophobia and related social issues

    New Auto-Interp
    Negative Logits
     Chocobo
    -0.72
    tnc
    -0.71
    ibaba
    -0.70
    atche
    -0.70
    istors
    -0.70
    olicited
    -0.68
    ujah
    -0.67
    uce
    -0.65
    agra
    -0.64
    zinski
    -0.64
    POSITIVE LOGITS
    §
    0.78
    urst
    0.68
    stadt
    0.66
    ĭ
    0.65
    esse
    0.64
    resy
    0.64
    esy
    0.63
    y
    0.63
    ĩ
    0.63
    =-=-=-=-
    0.62
    Act Density 0.042%

    No Known Activations