INDEX
Explanations
terms related to racial issues and injustices
New Auto-Interp
Negative Logits
fak
-0.14
pling
-0.14
ares
-0.14
iram
-0.14
359
-0.14
endencies
-0.14
ioso
-0.14
ogl
-0.14
-worthy
-0.14
emin
-0.13
POSITIVE LOGITS
/color
0.23
profiling
0.19
minorities
0.18
ized
0.18
-neutral
0.16
/class
0.16
/E
0.16
bait
0.16
icious
0.15
cleansing
0.15
Activations Density 0.032%