INDEX
Explanations
mentions of discrimination based on various factors such as race, sexual orientation, and nationality
references to discrimination in various contexts
New Auto-Interp
Negative Logits
Interstitial
-0.86
sis
-0.72
ski
-0.71
Adds
-0.70
zyme
-0.68
ECD
-0.67
orah
-0.67
tom
-0.65
bold
-0.65
Sed
-0.64
POSITIVE LOGITS
discrimination
1.22
Discrimination
1.03
prejudice
0.96
discriminated
0.94
yip
0.91
rimination
0.87
discriminate
0.87
discrimination
0.85
prejudices
0.84
protections
0.81
Activations Density 0.016%