INDEX
    Explanations

    words related to violence and hate crimes

    references to violence and hate crimes

    New Auto-Interp
    Negative Logits
    shire
    -0.88
    sonian
    -0.79
    gio
    -0.74
     Parables
    -0.73
    phrine
    -0.70
    hower
    -0.70
     Oops
    -0.69
     Guinness
    -0.68
    ROM
    -0.68
     Lunar
    -0.67
    POSITIVE LOGITS
     intimidation
    0.98
     violence
    0.98
     retaliation
    0.94
     indiscrim
    0.93
     perpetrated
    0.93
     harassment
    0.93
     harass
    0.92
     retribution
    0.89
     persecution
    0.89
     slurs
    0.85
    Act Density 0.357%

    No Known Activations