INDEX
Explanations
terms related to different demographic characteristics or categories such as race, ethnicity, nationality, religion, and disabilities
terms related to social identity and discrimination categories
New Auto-Interp
Negative Logits
writers
-0.79
Raphael
-0.72
Peng
-0.67
sers
-0.66
Canaver
-0.65
Kers
-0.64
Byr
-0.64
said
-0.64
HEL
-0.63
Dean
-0.62
POSITIVE LOGITS
ethnicity
1.69
nationality
1.59
gender
1.54
Gender
1.39
ethnic
1.37
Gender
1.33
creed
1.28
sexuality
1.27
gender
1.26
Ethnic
1.25
Activations Density 0.171%