INDEX
    Explanations

    mentions or discussions of racist behaviors or beliefs

    occurrences and discussions of racism

    New Auto-Interp
    Negative Logits
    hower
    -0.81
    Delivery
    -0.81
    pad
    -0.80
    amina
    -0.80
    icular
    -0.80
    RH
    -0.79
    tis
    -0.78
    avez
    -0.78
    ership
    -0.77
    weeney
    -0.76
    POSITIVE LOGITS
     slurs
    1.15
     racist
    0.95
     stereotyp
    0.91
     nationalist
    0.90
     racists
    0.90
     prejudice
    0.89
     stereotypes
    0.88
     caric
    0.86
     tir
    0.86
     sexist
    0.86
    Act Density 0.029%

    No Known Activations