INDEX
Explanations
phrases related to racial disparities, discrimination, and social inequities
New Auto-Interp
Negative Logits
drip
-0.73
amina
-0.71
ellipt
-0.70
odium
-0.70
DCS
-0.68
ocket
-0.68
cold
-0.67
acles
-0.66
Telesc
-0.66
pload
-0.65
POSITIVE LOGITS
minorities
1.16
supremacist
1.13
slurs
1.12
supremacists
1.12
ethnic
1.09
backgrounds
1.01
immigrants
1.01
ethnicity
1.00
males
1.00
nationalist
1.00
Activations Density 4.396%