INDEX
    Explanations

    violence, abuse, and harmful behavior

    New Auto-Interp
    Negative Logits
     dividend
    0.59
     dividends
    0.49
     Y
    0.46
    dividend
    0.45
     panning
    0.44
     Tom
    0.42
     artifacts
    0.42
     Dividend
    0.42
     품질
    0.42
     arbitrage
    0.41
    POSITIVE LOGITS
    Violence
    0.91
     violência
    0.89
     violence
    0.88
     Violence
    0.88
     perpetrators
    0.84
     perpetrator
    0.84
     violencia
    0.82
    bullying
    0.81
    0.80
     bullying
    0.79
    Act Density 0.390%

    No Known Activations