INDEX
Explanations
terms related to racial identity and discrimination
topics related to race and ethnicity
New Auto-Interp
Negative Logits
erva
-0.84
unction
-0.78
UNE
-0.77
cit
-0.76
irs
-0.75
uden
-0.74
ushima
-0.74
orage
-0.74
Mub
-0.74
unker
-0.73
POSITIVE LOGITS
course
1.03
Equality
0.86
horse
0.81
blind
0.78
slurs
0.78
prejudice
0.77
bending
0.76
hair
0.76
Discrimination
0.75
relations
0.74
Activations Density 0.018%