INDEX
Explanations
references to racial discrimination and inequality
New Auto-Interp
Negative Logits
Karlov
-0.15
consect
-0.15
ecko
-0.15
ENU
-0.14
olest
-0.14
ampus
-0.14
compan
-0.14
ÑĮко
-0.14
743
-0.14
#Region
-0.13
POSITIVE LOGITS
inferior
0.34
Infer
0.26
backward
0.26
infer
0.25
foreign
0.24
Races
0.23
races
0.23
primitives
0.22
backwards
0.21
foreigners
0.21
Activations Density 0.166%