INDEX
    Explanations

    terms related to racial bias or discrimination

    topics related to race and racial issues

    New Auto-Interp
    Negative Logits
    uden
    -0.97
    icular
    -0.83
    tower
    -0.81
    20439
    -0.80
    ertodd
    -0.78
    amina
    -0.76
    rov
    -0.76
    dra
    -0.76
    debian
    -0.75
    OHN
    -0.74
    POSITIVE LOGITS
     slurs
    1.16
    ized
    0.99
     minorities
    0.98
     profiling
    0.95
     racial
    0.95
     caste
    0.94
     violence
    0.93
     stereotypes
    0.93
     discrimination
    0.92
     affili
    0.91
    Act Density 0.015%

    No Known Activations