INDEX
Explanations
references to racial dynamics and biases in society
New Auto-Interp
Negative Logits
consumers
-0.14
bÃło
-0.14
еле
-0.14
ë°Ģ
-0.14
ÃĸL
-0.14
Consumers
-0.13
USR
-0.13
thora
-0.13
LOUR
-0.13
consumer
-0.12
POSITIVE LOGITS
firefighter
0.41
firefighters
0.39
Fire
0.38
fire
0.34
Fire
0.34
firefight
0.33
/fire
0.30
fire
0.29
FIRE
0.29
-fire
0.28
Activations Density 0.003%