INDEX
Explanations
instances of racism
terms related to racism and its manifestations
New Auto-Interp
Negative Logits
icular
-0.83
Delivery
-0.81
pad
-0.80
earchers
-0.79
imen
-0.77
irs
-0.75
ITNESS
-0.74
amina
-0.73
Pad
-0.72
Vs
-0.71
POSITIVE LOGITS
slurs
1.06
prejudice
0.99
stereotyp
0.84
hatred
0.82
racists
0.82
racist
0.81
ethnic
0.80
racism
0.79
stereotypes
0.78
nationalist
0.78
Activations Density 0.023%