INDEX
    Explanations

    phrases related to disrespect, insult, or humiliation towards individuals or groups

    instances of disrespect or hostility towards individuals or groups

    New Auto-Interp
    Negative Logits
    minster
    -0.71
    forward
    -0.70
    kj
    -0.69
    ogg
    -0.69
    HEAD
    -0.66
    combe
    -0.65
    soDeliveryDate
    -0.65
    atonin
    -0.64
    aunder
    -0.64
    Netflix
    -0.63
    POSITIVE LOGITS
     gays
    0.95
     homosexuals
    0.93
     minorities
    0.93
     Mexicans
    0.89
     Muslims
    0.85
     foreigners
    0.85
     humanity
    0.85
     Hispanics
    0.84
     Arabs
    0.84
     anyone
    0.83
    Act Density 0.259%

    No Known Activations