INDEX
Explanations
references to the Eastern regions or cultures
New Auto-Interp
Negative Logits
northwest
-0.16
antro
-0.16
aison
-0.15
Western
-0.15
southwest
-0.14
western
-0.14
.dumps
-0.14
urch
-0.14
lect
-0.14
ando
-0.14
POSITIVE LOGITS
ablish
0.28
ern
0.22
Bloc
0.19
seab
0.19
wing
0.18
erner
0.18
ERN
0.17
bloc
0.17
hetic
0.17
gota
0.17
Activations Density 0.049%