INDEX
    Explanations

    derogatory terms or insults directed towards individuals

    derogatory terms and insults aimed at individuals or groups

    New Auto-Interp
    Negative Logits
     conclud
    -0.72
     Printed
    -0.71
     survives
    -0.71
     Surviv
    -0.70
    foreseen
    -0.68
    ortality
    -0.67
     traject
    -0.67
     ORDER
    -0.66
    igsaw
    -0.64
     longitudinal
    -0.63
    POSITIVE LOGITS
     liar
    1.03
     coward
    0.93
     irresponsible
    0.92
     traitor
    0.91
     unfit
    0.90
     hypocr
    0.89
     unworthy
    0.87
     disgrace
    0.84
     disrespectful
    0.82
     insensitive
    0.81
    Act Density 0.387%

    No Known Activations