INDEX
Explanations
terms related to racial identities and racial discrimination
references to race and ethnicity in the context of discrimination and social justice issues
New Auto-Interp
Negative Logits
erva
-0.90
unction
-0.85
cit
-0.83
ows
-0.76
irs
-0.76
Mub
-0.74
uden
-0.74
odder
-0.73
iaries
-0.72
anmar
-0.72
POSITIVE LOGITS
Equality
0.91
course
0.90
blind
0.82
Discrimination
0.80
horse
0.78
boat
0.76
slurs
0.76
Race
0.75
Menu
0.74
race
0.73
Activations Density 0.018%