INDEX
Explanations
references to the term "Western."
New Auto-Interp
Negative Logits
anton
-0.19
nel
-0.19
allows
-0.16
loom
-0.16
turnstile
-0.15
295
-0.15
945
-0.14
306
-0.14
Olson
-0.14
olar
-0.14
POSITIVE LOGITS
most
0.29
ized
0.26
blot
0.24
ization
0.24
s
0.24
-most
0.23
ers
0.22
esse
0.21
ised
0.21
Hemisphere
0.21
Activations Density 0.011%