INDEX
    Explanations

    negative or offensive language

    expressions related to societal criticism and personal discontent

    New Auto-Interp
    Negative Logits
    displayText
    -0.90
    UMP
    -0.74
    raft
    -0.64
     Immediately
    -0.62
    announced
    -0.62
     Payments
    -0.62
     reopened
    -0.61
     IPM
    -0.61
    LET
    -0.61
    raltar
    -0.60
    POSITIVE LOGITS
     shitty
    0.99
     fucking
    0.92
     goddamn
    0.90
     shit
    0.89
     fuckin
    0.85
     sociop
    0.85
     fucked
    0.83
     patriarchy
    0.81
     kinda
    0.81
     misogyn
    0.80
    Act Density 1.178%

    No Known Activations