INDEX
Explanations
references to institutions, events, and prominent figures in culture and history
New Auto-Interp
Negative Logits
ipsis
-0.16
ersiz
-0.15
ynchron
-0.15
gne
-0.15
ÑĢави
-0.14
sil
-0.14
onium
-0.14
uffers
-0.14
unist
-0.13
ansom
-0.13
POSITIVE LOGITS
White
0.23
White
0.20
.White
0.17
Whit
0.16
çϽ
0.16
Wh
0.16
Whites
0.16
WHITE
0.16
_white
0.16
WHITE
0.16
Activations Density 0.018%