INDEX
    Explanations

    phrases related to harmful acts or actions

    words related to acts of violence and their consequences

    New Auto-Interp
    Negative Logits
    arest
    -0.72
     Plat
    -0.70
    aver
    -0.67
    oult
    -0.67
    pole
    -0.66
    iHUD
    -0.66
    hack
    -0.65
     therapy
    -0.64
    oret
    -0.63
    arro
    -0.61
    POSITIVE LOGITS
     perpetrated
    1.07
     committing
    0.92
     withd
    0.91
    interstitial
    0.90
     committed
    0.86
    ahime
    0.84
     impunity
    0.82
    ãĥ¼ãĥĨ
    0.80
     heinous
    0.80
    20439
    0.78
    Act Density 0.007%

    No Known Activations