INDEX
    Explanations

    phrases and concepts related to threats and violence

    New Auto-Interp
    Negative Logits
    iped
    -0.15
     prefix
    -0.15
    135
    -0.15
    azor
    -0.15
    295
    -0.14
     lis
    -0.14
    /categories
    -0.14
     Headquarters
    -0.14
    esper
    -0.14
    602
    -0.14
    POSITIVE LOGITS
     kill
    0.35
     Kill
    0.28
     murder
    0.27
    kill
    0.26
    Kill
    0.24
    .kill
    0.24
     kid
    0.24
     commit
    0.24
    _kill
    0.23
     kills
    0.23
    Act Density 0.291%

    No Known Activations