INDEX
    Explanations

    vulgar and offensive terms

    references to sexual or vulgar terms

    New Auto-Interp
    Negative Logits
    iation
    -0.83
    othy
    -0.75
    ril
    -0.75
    aneously
    -0.75
    GRE
    -0.74
    VERTISEMENT
    -0.73
    ORGE
    -0.68
    reek
    -0.67
    OR
    -0.66
    oS
    -0.66
    POSITIVE LOGITS
    ussy
    0.97
    ignt
    0.82
     panties
    0.80
    holes
    0.79
    cat
    0.79
    hole
    0.79
     Riot
    0.78
    essee
    0.77
    chet
    0.77
     lips
    0.76
    Act Density 0.011%

    No Known Activations