INDEX
Explanations
phrases related to white supremacy and discrimination
references to white supremacist groups and ideologies
New Auto-Interp
Negative Logits
fixed
-0.77
Glass
-0.77
MAC
-0.72
spring
-0.72
zl
-0.71
inet
-0.71
Bus
-0.71
Body
-0.70
Eyes
-0.69
Medium
-0.67
POSITIVE LOGITS
guiActiveUn
1.15
supremacist
1.05
supremacists
1.02
suprem
0.97
sympath
0.86
prejudice
0.79
ervative
0.78
supremacy
0.77
referen
0.77
ervatives
0.77
Activations Density 0.014%