INDEX
Explanations
mentions of discrimination based on various factors, such as sexual orientation, race, and disability
references to discrimination, particularly in legal or social contexts
New Auto-Interp
Negative Logits
ski
-0.72
Adds
-0.71
cycle
-0.71
tom
-0.70
TOR
-0.68
DCS
-0.68
bold
-0.67
hran
-0.66
sis
-0.66
links
-0.65
POSITIVE LOGITS
discrimination
0.97
rimination
0.92
prejudice
0.85
discriminated
0.84
Discrimination
0.82
protections
0.79
retaliation
0.79
slurs
0.78
prejud
0.78
discriminating
0.77
Activations Density 0.034%