INDEX
    Explanations

    mentions of hate crimes and hate speech

    mentions of hate crimes and related terminology

    New Auto-Interp
    Negative Logits
    UNCH
    -0.76
    aver
    -0.71
    amina
    -0.70
    enture
    -0.70
    clinton
    -0.69
     reluct
    -0.69
    ITNESS
    -0.68
     Prospect
    -0.68
    BuyableInstoreAndOnline
    -0.68
    atel
    -0.67
    POSITIVE LOGITS
    fulness
    1.13
     crimes
    1.13
     speech
    1.05
    fully
    1.02
    ful
    0.97
     crime
    0.95
    speech
    0.91
     Crimes
    0.88
     Speech
    0.88
     mobs
    0.88
    Act Density 0.044%

    No Known Activations