INDEX
Explanations
terms related to race and discrimination
terms related to race and discrimination
New Auto-Interp
Negative Logits
ocket
-0.76
soDeliveryDate
-0.74
ingen
-0.74
imar
-0.73
inventoryQuantity
-0.73
anmar
-0.72
issan
-0.69
UNE
-0.68
TOR
-0.66
urances
-0.65
POSITIVE LOGITS
prejudice
0.96
Discrimination
0.93
ethnicity
0.91
ancestry
0.88
stereotyp
0.86
supremacist
0.86
discrimination
0.82
identity
0.82
slurs
0.81
prejudices
0.80
Activations Density 0.106%