INDEX
Explanations
discussion related to societal issues such as racism and diverse points of view
New Auto-Interp
Negative Logits
Delivery
-0.79
icular
-0.78
earchers
-0.78
pad
-0.77
ership
-0.75
ishable
-0.73
avez
-0.73
TAIN
-0.72
largeDownload
-0.72
irs
-0.70
POSITIVE LOGITS
slurs
0.98
prejudice
0.93
racist
0.87
racists
0.82
sexist
0.81
prejud
0.80
homophobic
0.79
stereotyp
0.79
tir
0.77
backlash
0.76
Activations Density 5.763%