INDEX
    Explanations

    offensive or inappropriate content

    New Auto-Interp
    Negative Logits
     bakalım
    0.84
     saver
    0.71
     unbeaten
    0.69
    спери
    0.67
     दिलचस्प
    0.67
    pherd
    0.67
    0.66
    দর
    0.65
     Scissors
    0.65
    PickerController
    0.64
    POSITIVE LOGITS
     sexual
    1.69
     sexually
    1.68
     offensive
    1.68
     content
    1.56
     depictions
    1.55
     vulgar
    1.52
     hateful
    1.52
     derogatory
    1.51
     inappropriate
    1.49
     misog
    1.49
    Act Density 2.982%

    No Known Activations