INDEX
Explanations
references to white supremacist groups such as the Ku Klux Klan (KKK) and related terms
mentions and references to hate groups, specifically the Ku Klux Klan and related organizations
New Auto-Interp
Negative Logits
ochond
-0.78
RW
-0.77
phrine
-0.77
Downloadha
-0.77
neau
-0.76
cially
-0.73
*/(
-0.72
cing
-0.71
lessly
-0.69
ably
-0.68
POSITIVE LOGITS
Klux
1.34
Klan
1.31
KKK
1.06
affili
0.77
robes
0.77
supremacist
0.76
supremacists
0.74
affiliation
0.74
NAACP
0.73
imperson
0.72
Activations Density 0.009%