INDEX
Explanations
mentions of specific place names, particularly those related to Santa and associated locations
New Auto-Interp
Negative Logits
ubern
-0.18
geh
-0.17
hetto
-0.16
ihu
-0.16
hin
-0.16
uitka
-0.16
ubl
-0.15
PRINTF
-0.15
Contents
-0.15
ack
-0.14
POSITIVE LOGITS
Claus
0.19
clare
0.17
gram
0.16
atorium
0.16
com
0.16
angelo
0.15
립
0.15
Rosa
0.15
Barbara
0.14
eced
0.14
Activations Density 0.012%