INDEX
Explanations
the name "White"
occurrences of the name "White."
New Auto-Interp
Negative Logits
ITAL
-0.86
ngth
-0.81
igslist
-0.80
ategory
-0.78
yrinth
-0.78
cffffcc
-0.77
itals
-0.76
itual
-0.76
rative
-0.75
=-=-=-=-
-0.75
POSITIVE LOGITS
caps
1.16
supremacist
1.10
house
1.09
horn
1.05
supremacists
1.02
berry
1.02
horse
1.01
Sox
0.99
bread
0.97
houses
0.96
Activations Density 0.025%