INDEX
    Explanations

    derogatory language and discriminatory remarks toward individuals or groups

    derogatory language and offensive comments

    New Auto-Interp
    Negative Logits
    Luck
    -0.75
    ELY
    -0.70
    oglu
    -0.70
    arten
    -0.69
     Luck
    -0.69
    INC
    -0.68
    ellect
    -0.67
    ederal
    -0.66
     Mechan
    -0.65
    UNCH
    -0.65
    POSITIVE LOGITS
     slurs
    1.09
     lewd
    1.07
     uttered
    0.98
     derogatory
    0.97
     harassing
    0.97
     insulting
    0.96
     inappropriately
    0.96
     inappropriate
    0.94
     indecent
    0.93
     homophobic
    0.92
    Act Density 0.340%

    No Known Activations