INDEX
Explanations
phrases related to discrimination and bias against specific groups or individuals
references to discrimination and marginalized groups
New Auto-Interp
Negative Logits
gency
-0.78
forward
-0.75
ciation
-0.72
Tycoon
-0.72
éĹĺ
-0.70
fax
-0.69
fecture
-0.68
dust
-0.67
Logged
-0.67
osate
-0.66
POSITIVE LOGITS
minorities
1.67
marginalized
1.39
gays
1.36
women
1.33
minority
1.32
homosexuals
1.32
LGBTQ
1.32
transgender
1.28
Latinos
1.27
lesbians
1.27
Activations Density 0.235%