INDEX
    Explanations

    internet-related content with potential legal or offensive implications, specifically related to social media comments or online behavior

    New Auto-Interp
    Negative Logits
    urion
    -0.77
    cells
    -0.75
    ongevity
    -0.73
     endurance
    -0.71
    asio
    -0.70
     Ports
    -0.69
     Endurance
    -0.69
    byss
    -0.69
    ulton
    -0.69
     regeneration
    -0.68
    POSITIVE LOGITS
     derogatory
    1.34
     pornographic
    1.30
     blasp
    1.30
     insulting
    1.25
     objectionable
    1.23
     lewd
    1.20
     slurs
    1.18
     misogyn
    1.17
     hateful
    1.16
     swast
    1.15
    Act Density 0.746%

    No Known Activations