INDEX
    Explanations

    references to different types of violence and harassment

    topics related to violence and harassment

    New Auto-Interp
    Negative Logits
    mega
    -0.72
     Slim
    -0.66
    pop
    -0.64
    izons
    -0.63
     outlook
    -0.63
    yssey
    -0.62
     Blueprint
    -0.62
    hedral
    -0.62
    Bright
    -0.61
    Zen
    -0.61
    POSITIVE LOGITS
     perpetrated
    1.30
     inflicted
    1.25
     oneself
    1.04
     punishable
    1.04
     offences
    1.03
     prohibited
    1.02
     unintentional
    1.00
     uttered
    0.99
     inflic
    0.99
     manslaughter
    0.98
    Act Density 0.338%

    No Known Activations