INDEX
    Explanations

    words related to insults or derogatory remarks

    references to insults or derogatory language

    New Auto-Interp
    Negative Logits
    arijuana
    -0.82
    20439
    -0.78
    ilver
    -0.75
    negie
    -0.74
    etheus
    -0.74
    aver
    -0.72
    ccording
    -0.70
    angler
    -0.68
     Folder
    -0.67
    agically
    -0.66
    POSITIVE LOGITS
     insult
    1.37
     insults
    1.17
     insulted
    1.16
     insulting
    1.05
    ingly
    0.97
     disrespect
    0.97
     humili
    0.92
     prejudice
    0.89
     offend
    0.86
    ãĤ¹ãĥĪ
    0.86
    Act Density 0.015%

    No Known Activations