INDEX
Explanations
identifiers related to diversity and discrimination such as race, ethnicity, nationality, and religion
references to social categories and identity markers such as race, ethnicity, gender, and disability
New Auto-Interp
Negative Logits
hiba
-0.87
ocket
-0.80
pload
-0.70
alore
-0.69
noon
-0.67
Ack
-0.66
Adds
-0.64
put
-0.63
akings
-0.63
kens
-0.62
POSITIVE LOGITS
ethnicity
1.35
nationality
1.13
gender
1.09
ethnic
1.06
ethnic
1.05
genders
1.03
Gender
1.01
gender
1.00
Ethnic
0.98
Gender
0.97
Activations Density 0.202%