INDEX
    Explanations

    references to graphic or inappropriate content, particularly in relation to violence and sexual themes

    New Auto-Interp
    Negative Logits
    Viitteet
    -0.60
     createState
    -0.57
     kağıt
    -0.53
     المعيارى
    -0.52
     rospy
    -0.52
     flattered
    -0.51
     potest
    -0.50
    orcid
    -0.50
     صوتيه
    -0.48
     scattata
    -0.48
    POSITIVE LOGITS
    ="@+
    0.60
     violent
    0.59
     Shock
    0.58
    censored
    0.57
     parental
    0.56
     aspects
    0.56
    SourceChecksum
    0.56
     Sho
    0.55
     Violent
    0.54
     VIOL
    0.54
    Act Density 0.132%

    No Known Activations