INDEX
    Explanations

    offensive and derogatory terms

    terms related to sexuality and derogatory language

    New Auto-Interp
    Negative Logits
    ulhu
    -0.84
    actic
    -0.81
    scl
    -0.79
    owan
    -0.79
    sonian
    -0.77
    son
    -0.74
     Flavoring
    -0.73
    ointment
    -0.73
    ERG
    -0.72
    acter
    -0.71
    POSITIVE LOGITS
     panties
    0.94
     nuns
    0.80
     pussy
    0.78
     lips
    0.77
     vagina
    0.76
     Melania
    0.76
     breasts
    0.74
     Lucia
    0.73
     boobs
    0.73
     Riot
    0.72
    Act Density 0.054%

    No Known Activations