INDEX
Explanations
the word “black” when referring to race or African‐American identity.
New Auto-Interp
Negative Logits
Defs
-0.07
.Expressions
-0.07
Nan
-0.07
guitarist
-0.07
路
-0.07
även
-0.07
pregunta
-0.07
Prophet
-0.06
Cash
-0.06
ush
-0.06
POSITIVE LOGITS
black
0.08
shake
0.07
Black
0.07
chunks
0.07
黑
0.07
enorm
0.06
())
0.06
black
0.06
.".
0.06
premature
0.06
Activations Density 0.010%