INDEX
    Explanations

    phrases indicating threats or potential violence

    New Auto-Interp
    Negative Logits
    ÑĤÑĢи
    -0.15
    iped
    -0.15
    653
    -0.15
    umbs
    -0.14
    atos
    -0.14
     Grove
    -0.14
     reusable
    -0.14
    undos
    -0.14
    591
    -0.13
     offenses
    -0.13
    POSITIVE LOGITS
     electro
    0.23
     lyn
    0.20
     staple
    0.18
     perman
    0.17
     sued
    0.17
     pitch
    0.17
     permanently
    0.17
     punch
    0.16
     Electro
    0.16
     sue
    0.16
    Act Density 0.221%

    No Known Activations