INDEX
Explanations
mentions of specific ethnic groups, possibly within a societal or political context
mentions of ethnic groups and discussions surrounding ethnicity-related topics
New Auto-Interp
Negative Logits
uden
-0.92
tower
-0.88
aday
-0.87
agher
-0.80
aunder
-0.78
ertodd
-0.74
etheus
-0.72
doors
-0.71
ocket
-0.71
20439
-0.70
POSITIVE LOGITS
cleansing
1.19
minorities
1.09
ities
1.05
minority
0.90
slurs
0.83
appropriation
0.82
ethnic
0.79
affiliation
0.79
backgrounds
0.79
supremacists
0.79
Activations Density 0.026%