INDEX
    Explanations

    instances of violence or abusive behavior

    New Auto-Interp
    Negative Logits
     misiones
    -0.41
     trữ
    -0.41
     plán
    -0.40
    úsqueda
    -0.40
     resources
    -0.40
     Vermögen
    -0.39
     cesty
    -0.39
    complexType
    -0.38
    visionnement
    -0.38
     Sicherung
    -0.37
    POSITIVE LOGITS
     beat
    0.67
    beat
    0.66
     punches
    0.64
     BEAT
    0.63
    httphttps
    0.60
     Beat
    0.60
     beating
    0.59
     beaten
    0.58
     ضرب
    0.58
     beats
    0.57
    Act Density 0.420%

    No Known Activations