INDEX
    Explanations

    words related to insults and derogatory language

    references to insults and derogatory remarks

    New Auto-Interp
    Negative Logits
    negie
    -0.73
    ccording
    -0.72
    ullivan
    -0.70
    ucket
    -0.68
    illon
    -0.68
    ilver
    -0.68
    ail
    -0.67
    arten
    -0.66
    angler
    -0.65
    ooth
    -0.65
    POSITIVE LOGITS
    ingly
    1.02
     insult
    0.99
    ãĤ¹ãĥĪ
    0.89
     insulted
    0.86
     insulting
    0.83
     insults
    0.83
     prejudice
    0.77
     disrespect
    0.76
     ridicule
    0.75
     hurled
    0.74
    Act Density 0.022%

    No Known Activations