INDEX
    Explanations

    mentions of attacks or aggressive actions

    terms associated with aggressive discourse or confrontational communication

    New Auto-Interp
    Negative Logits
    ITNESS
    -0.73
     Suc
    -0.68
     Norn
    -0.66
    significant
    -0.66
     Wonders
    -0.66
     orderly
    -0.65
     transitions
    -0.65
     Alive
    -0.65
    existent
    -0.63
     Transform
    -0.63
    POSITIVE LOGITS
     leveled
    1.18
     accusing
    1.18
     tir
    1.17
     against
    1.13
     hurled
    1.11
     levied
    1.07
    against
    1.06
     slurs
    1.02
     denounce
    1.00
     denouncing
    0.99
    Act Density 0.225%

    No Known Activations