INDEX
Explanations
terms related to discrimination based on characteristics such as race, ethnicity, nationality, religion, and disability
references to social categories related to identity and discrimination
New Auto-Interp
Negative Logits
writers
-0.79
ahead
-0.75
Byr
-0.75
Drawn
-0.68
nels
-0.68
ateurs
-0.67
mr
-0.67
rats
-0.66
reb
-0.65
irect
-0.65
POSITIVE LOGITS
ethnicity
1.86
nationality
1.75
gender
1.60
ethnic
1.49
creed
1.43
Gender
1.40
Ethnic
1.37
socioeconomic
1.35
Gender
1.30
sexuality
1.30
Activations Density 0.120%