INDEX
Explanations
words related to racial diversity and demographics
mentions of racial or ethnic descriptors
New Auto-Interp
Negative Logits
ĸļ
-0.85
Vinyl
-0.72
ļé
-0.71
erker
-0.70
Dragonbound
-0.63
İĭ
-0.63
Peaks
-0.63
Cod
-0.62
YC
-0.62
Clan
-0.62
POSITIVE LOGITS
rha
1.03
vana
0.98
cles
0.90
andom
0.89
ror
0.88
mingham
0.88
ROR
0.85
ilateral
0.85
nie
0.84
rr
0.84
Activations Density 0.015%