INDEX
Explanations
mentions of the term "white"
references to the term 'white'
New Auto-Interp
Negative Logits
=-=-=-=-
-0.85
yrinth
-0.82
HCR
-0.81
SIGN
-0.78
cffffcc
-0.77
interstitial
-0.75
ategory
-0.74
itual
-0.74
=-=-
-0.73
Allow
-0.73
POSITIVE LOGITS
supremacist
1.24
supremacists
1.08
nationalist
0.98
suprem
0.94
white
0.89
supremacy
0.87
elephant
0.86
berry
0.84
nationalists
0.83
caps
0.82
Activations Density 0.022%