INDEX
    Explanations

    harmful and inappropriate content

    New Auto-Interp
    Negative Logits
    చు
    0.68
     murdered
    0.68
     slain
    0.65
     killings
    0.65
     मरने
    0.64
     stabbed
    0.64
     murders
    0.62
     humble
    0.62
     précéd
    0.61
     boredom
    0.61
    POSITIVE LOGITS
     inappropriate
    2.55
     unethical
    2.31
     unsuitable
    2.07
     unsustainable
    2.01
     improper
    2.00
     harmful
    2.00
     inappropri
    1.98
     unhealthy
    1.97
     irresponsible
    1.93
     inappropriately
    1.92
    Act Density 2.043%

    No Known Activations