INDEX
    Explanations

    terms related to racial issues and discrimination

    references to racial issues and discrimination

    New Auto-Interp
    Negative Logits
    uden
    -0.98
    amina
    -0.82
    tower
    -0.81
    oning
    -0.80
    rov
    -0.80
    icular
    -0.78
    20439
    -0.78
    dra
    -0.77
    ertodd
    -0.76
    debian
    -0.76
    POSITIVE LOGITS
     slurs
    1.08
     minorities
    0.99
     racial
    0.97
    ized
    0.93
     caste
    0.93
     profiling
    0.89
     affili
    0.88
     backgrounds
    0.86
     discrimination
    0.84
     stereotypes
    0.84
    Act Density 0.015%

    No Known Activations