INDEX
    Explanations

    derogatory and offensive language

    New Auto-Interp
    Negative Logits
    enhagen
    -0.72
    escal
    -0.68
     corrid
    -0.66
    earchers
    -0.66
    ãĥ¼ãĥ³
    -0.65
    paren
    -0.64
    restricted
    -0.64
     Passage
    -0.64
     bilateral
    -0.63
     uninterrupted
    -0.63
    POSITIVE LOGITS
    fuck
    1.06
     bastard
    1.01
     bitch
    0.96
     asshole
    0.96
    hole
    0.95
    gery
    0.95
     hypocr
    0.94
     cunt
    0.94
     crap
    0.91
     liar
    0.89
    Act Density 0.167%

    No Known Activations