INDEX
    Explanations

    explicit mentions of racism

    terms related to racism and accusations of racist behavior

    New Auto-Interp
    Negative Logits
    ITNESS
    -0.88
    pad
    -0.83
    icular
    -0.78
    amina
    -0.77
    earchers
    -0.71
    ATURE
    -0.71
    arios
    -0.71
    Delivery
    -0.71
    itness
    -0.69
    stantial
    -0.69
    POSITIVE LOGITS
     slurs
    1.27
     prejudice
    1.02
     bigot
    0.96
     tir
    0.96
     slur
    0.93
     racists
    0.92
     hatred
    0.92
     racist
    0.91
     stereotyp
    0.91
     stereotypes
    0.91
    Act Density 0.047%

    No Known Activations