INDEX
Explanations
terms related to ethnicity and race
New Auto-Interp
Negative Logits
iero
-0.18
른
-0.17
aries
-0.15
μαÏĦα
-0.15
Chop
-0.15
fak
-0.14
yun
-0.14
ylon
-0.14
zen
-0.14
illance
-0.14
POSITIVE LOGITS
minority
0.21
minorities
0.21
Minority
0.19
-specific
0.19
cleansing
0.18
/r
0.18
ities
0.17
oot
0.17
Cleans
0.16
оналÑĮ
0.16
Activations Density 0.019%