INDEX
    Explanations

    words related to negative events such as attacks, crashes, and blasts

    references to violent incidents or attacks

    New Auto-Interp
    Negative Logits
    ophy
    -0.71
    anium
    -0.68
    uchin
    -0.67
    Intern
    -0.67
     iodine
    -0.64
    hemy
    -0.64
    glomer
    -0.63
    anol
    -0.63
     ILCS
    -0.62
    bil
    -0.62
    POSITIVE LOGITS
     spree
    1.07
     occurred
    0.86
     unfold
    0.81
    âĶĢâĶĢ
    0.81
     rampage
    0.80
     happened
    0.79
     unfolded
    0.78
     stemmed
    0.78
     ordeal
    0.77
     perpetrated
    0.75
    Act Density 0.200%

    No Known Activations