INDEX
Explanations
references to racial and socio-economic disparities
New Auto-Interp
Negative Logits
avoid
-0.19
éģ¿
-0.17
Avoid
-0.16
olon
-0.16
Avoid
-0.16
avoid
-0.15
ازÙĩ
-0.15
ould
-0.15
prevent
-0.14
OLON
-0.14
POSITIVE LOGITS
black
0.51
Black
0.44
black
0.43
Black
0.41
é»ij
0.40
BLACK
0.39
African
0.37
é»Ĵ
0.37
-black
0.37
BLACK
0.34
Activations Density 0.145%