INDEX
    Explanations

    references to race

    references to race and related concepts

    New Auto-Interp
    Negative Logits
    cit
    -0.80
    UNE
    -0.79
    erva
    -0.75
    irs
    -0.74
    anmar
    -0.73
    orage
    -0.72
    hiba
    -0.71
    unction
    -0.71
    ickson
    -0.70
    psons
    -0.70
    POSITIVE LOGITS
    course
    0.91
     Equality
    0.88
    blind
    0.87
     prejudice
    0.83
     slurs
    0.82
     supremacy
    0.82
     relations
    0.82
     Discrimination
    0.81
    bending
    0.76
     purity
    0.75
    Act Density 0.029%

    No Known Activations