INDEX
    Explanations

    content related to harmful or offensive behavior and violations of guidelines

    New Auto-Interp
    Negative Logits
    uble
    -0.18
    rego
    -0.16
    ocular
    -0.14
    лоп
    -0.14
    ution
    -0.14
    alie
    -0.14
    765
    -0.14
    ogle
    -0.14
    yssey
    -0.14
    ogany
    -0.14
    POSITIVE LOGITS
     offensive
    0.20
     Offensive
    0.16
     nudity
    0.15
     invasion
    0.15
    Ùħبر
    0.15
    esin
    0.15
    _again
    0.15
    addle
    0.15
     Content
    0.14
     Heath
    0.14
    Act Density 0.050%

    No Known Activations