INDEX
Explanations
references to racial identity, specifically focusing on mentions of the term "whites."
references to the term "whites" in various contexts
New Auto-Interp
Negative Logits
Brig
-0.75
bid
-0.71
yrinth
-0.68
ESCO
-0.67
ulous
-0.65
Allow
-0.65
ANK
-0.64
QB
-0.64
Stra
-0.64
rolog
-0.63
POSITIVE LOGITS
whites
1.26
supremacists
0.99
lucent
0.89
paces
0.88
ervative
0.87
supremacist
0.86
pace
0.86
\\\\\\\\
0.84
peed
0.83
suprem
0.82
Activations Density 0.006%