INDEX
    Explanations

    phrases related to hate speech and hate crimes

    terminology related to hate and hate crimes

    New Auto-Interp
    Negative Logits
    aver
    -0.80
    UNCH
    -0.73
    Decre
    -0.72
    idges
    -0.72
    å§«
    -0.71
     Examination
    -0.68
     Pione
    -0.68
    interstitial
    -0.68
    ufact
    -0.67
    ODE
    -0.67
    POSITIVE LOGITS
    fully
    1.10
    fulness
    1.09
     crimes
    0.96
    ful
    0.88
     vengeance
    0.85
     prejudice
    0.82
     speech
    0.79
    hound
    0.78
     crime
    0.77
     hate
    0.76
    Act Density 0.030%

    No Known Activations