INDEX
Explanations
ethnicities or nationalities
ethnic and racial identity references
New Auto-Interp
Negative Logits
forcement
-0.77
VIDEOS
-0.70
pload
-0.69
initions
-0.68
MENTS
-0.66
livion
-0.64
ythm
-0.63
rocal
-0.62
subsequ
-0.62
agement
-0.62
POSITIVE LOGITS
enough
0.92
enough
0.88
ethnic
0.84
;
0.81
Caucasian
0.81
ancest
0.81
supremacist
0.80
born
0.77
listed
0.77
immigrant
0.76
Activations Density 0.156%