INDEX
    Explanations

    phrases related to causing harm to oneself or others

    references to death and killing

    New Auto-Interp
    Negative Logits
    soType
    -0.83
    eret
    -0.82
    taboola
    -0.76
    Wide
    -0.76
    soDeliveryDate
    -0.73
    å§«
    -0.72
    ĸļ
    -0.70
    tv
    -0.69
    URI
    -0.68
    worthiness
    -0.67
    POSITIVE LOGITS
     innocent
    0.84
     unborn
    0.81
     unarmed
    0.81
     intruder
    0.80
     terrorists
    0.78
     messenger
    0.77
     senseless
    0.75
     classmate
    0.74
     murderer
    0.74
     crap
    0.73
    Act Density 0.161%

    No Known Activations