INDEX
Explanations
topics related to racial diversity and systemic inequality
New Auto-Interp
Negative Logits
inet
-0.16
ilde
-0.15
vron
-0.15
onec
-0.15
rement
-0.15
aphrag
-0.15
GRAM
-0.14
ÐłÐµÑģп
-0.14
ırak
-0.14
pedia
-0.14
POSITIVE LOGITS
race
0.55
racial
0.53
Race
0.49
Race
0.47
race
0.44
African
0.44
racial
0.44
races
0.41
_race
0.41
racially
0.40
Activations Density 0.366%