INDEX
Explanations
mentions of race or skin color, especially in discussions of discrimination and social issues
references to people of color and related discussions on social issues
New Auto-Interp
Negative Logits
EMS
-0.82
INGTON
-0.82
chn
-0.79
sonian
-0.78
ãĤ´
-0.78
sg
-0.74
pload
-0.74
CHA
-0.74
ERSON
-0.73
xon
-0.73
POSITIVE LOGITS
slurs
0.85
minorities
0.80
backgrounds
0.78
communities
0.76
oppressed
0.75
flags
0.72
xual
0.70
queer
0.70
stereotypes
0.69
congreg
0.68
Activations Density 0.017%