INDEX
Explanations
terms related to race or ethnicity
New Auto-Interp
Negative Logits
NameInMap
-0.91
الحره
-0.71
Personendaten
-0.69
EconPapers
-0.68
"..\..\
-0.62
gonic
-0.62
={({-0.60
consectetur
-0.60
PhysRev
-0.60
propOrder
-0.58
POSITIVE LOGITS
white
1.29
white
1.23
White
1.15
White
1.12
WHITE
1.10
black
1.05
WHITE
1.00
白
1.00
black
0.99
whiteColor
0.93
Activations Density 0.204%