INDEX
Explanations
phrases related to discrimination and bias based on various characteristics such as race, gender, sexual orientation, and nationality
references to discrimination based on various characteristics such as race, sexual orientation, or disability
New Auto-Interp
Negative Logits
eva
-0.82
iland
-0.77
jet
-0.75
cca
-0.69
Guard
-0.67
MO
-0.66
adia
-0.65
cember
-0.64
NAS
-0.64
jan
-0.64
POSITIVE LOGITS
ethnicity
1.24
nationality
1.18
merit
1.13
whether
1.07
race
1.05
gender
1.04
characteristics
1.03
geography
1.02
demographics
1.01
similarity
1.00
Activations Density 0.245%