INDEX
Explanations
mentions of minority groups
references to various minority groups
New Auto-Interp
Negative Logits
wy
-0.76
aches
-0.70
osc
-0.62
dig
-0.61
MET
-0.60
dri
-0.60
Mous
-0.59
Ware
-0.59
Ready
-0.59
Spir
-0.58
POSITIVE LOGITS
minority
3.89
minorities
2.74
Minority
2.24
majority
1.73
majority
1.71
ority
1.47
Majority
1.38
marginalized
1.34
majorities
1.33
diversity
1.26
Activations Density 0.007%