INDEX
    Explanations

    mentions of racial issues or concepts in various contexts

    references to racial issues and profiling

    New Auto-Interp
    Negative Logits
    uden
    -0.90
    icular
    -0.87
    ertodd
    -0.86
    tower
    -0.85
    hower
    -0.79
    erva
    -0.78
    dra
    -0.78
    arent
    -0.77
    rov
    -0.77
    etsk
    -0.76
    POSITIVE LOGITS
     slurs
    1.15
    ized
    1.00
     minorities
    0.98
     profiling
    0.94
     violence
    0.93
     discrimination
    0.91
     supremacists
    0.90
     affili
    0.89
     stereotypes
    0.88
     Equality
    0.88
    Act Density 0.013%

    No Known Activations