INDEX
    Explanations

    mentions of racial issues or disparities

    terms related to racial issues and discrimination

    New Auto-Interp
    Negative Logits
    ertodd
    -0.90
    tower
    -0.80
    kens
    -0.78
    ipop
    -0.78
    20439
    -0.78
    amina
    -0.75
    stadt
    -0.75
    icular
    -0.75
    hran
    -0.74
    uden
    -0.73
    POSITIVE LOGITS
     slurs
    1.26
    ized
    1.10
     minorities
    1.02
     prejudice
    1.01
     disparities
    0.99
     profiling
    0.98
     discrimination
    0.98
     disparity
    0.95
     animosity
    0.94
     stereotypes
    0.94
    Act Density 0.035%

    No Known Activations