INDEX
Explanations
content related to race and racial disparities
New Auto-Interp
Negative Logits
huy
-0.57
domain
-0.57
aldehyde
-0.56
domain
-0.53
Jîn
-0.53
nąć
-0.52
recevrez
-0.52
mité
-0.51
Scherer
-0.50
transacción
-0.50
POSITIVE LOGITS
BLM
1.01
Black
1.00
protests
0.92
Black
0.90
black
0.86
protesters
0.85
BLACK
0.83
BLACK
0.83
racial
0.82
BLM
0.80
Activations Density 0.110%