INDEX
    Explanations

    words related to hatred and hate crimes

    references to hate crimes and hate speech

    New Auto-Interp
    Negative Logits
    ufact
    -0.80
    æ©Ł
    -0.80
    aver
    -0.76
    idges
    -0.73
    Decre
    -0.73
    clinton
    -0.71
    é¾įå
    -0.71
    ioned
    -0.71
     Tablet
    -0.71
    UNCH
    -0.70
    POSITIVE LOGITS
    fulness
    1.13
    fully
    1.11
     crimes
    1.01
     vengeance
    0.91
    ful
    0.89
     hate
    0.84
    bre
    0.82
     prejudice
    0.82
    hate
    0.80
     crime
    0.79
    Act Density 0.023%

    No Known Activations