INDEX
    Explanations

    references to hate crimes and criminal activities

    New Auto-Interp
    Negative Logits
    icio
    -0.81
    ãĥĥãĥī
    -0.81
    é¾įåĸļ士
    -0.74
    bits
    -0.73
    ernand
    -0.71
    comings
    -0.71
    dit
    -0.70
    BUS
    -0.68
     indisp
    -0.68
    adh
    -0.67
    POSITIVE LOGITS
     perpetrated
    0.97
     retaliation
    0.90
     prosecutions
    0.89
     spree
    0.89
     hotline
    0.82
     targeting
    0.82
     Victim
    0.76
     prevention
    0.76
     incidents
    0.75
     accusation
    0.74
    Act Density 0.027%

    No Known Activations