INDEX
    Explanations

    offensive and derogatory language, including profanity and insults

    derogatory terms and insults

    New Auto-Interp
    Negative Logits
    apers
    -0.82
    mental
    -0.79
    enza
    -0.76
    ctica
    -0.74
    ainted
    -0.73
    ENC
    -0.73
    âĹ¼
    -0.72
    undai
    -0.72
    ORN
    -0.71
    inguished
    -0.71
    POSITIVE LOGITS
     bitch
    1.33
    buster
    1.01
    fuck
    0.96
     asses
    0.96
    hole
    0.95
    holes
    0.91
     cunt
    0.88
     bastard
    0.86
     whore
    0.84
     bast
    0.81
    Act Density 0.008%

    No Known Activations