INDEX
    Explanations

    instances of threats and intimidation directed towards individuals or groups

    New Auto-Interp
    Negative Logits
    gorit
    -0.17
    itech
    -0.16
    оваÑĢ
    -0.14
    ocker
    -0.14
    emo
    -0.14
    inha
    -0.14
     inflicted
    -0.13
    ocyte
    -0.13
    FAST
    -0.13
    .React
    -0.13
    POSITIVE LOGITS
     threats
    0.33
     intimid
    0.25
     threat
    0.25
     intimidation
    0.25
     targeted
    0.24
     Threat
    0.24
    -threat
    0.23
     safety
    0.23
     harassment
    0.23
     threatened
    0.22
    Act Density 0.154%

    No Known Activations