INDEX
Explanations
instances of punctuation and sentence-ending markers
New Auto-Interp
Negative Logits
emer
-0.16
elay
-0.15
rey
-0.14
acey
-0.14
ibar
-0.14
Punk
-0.14
edd
-0.14
ZA
-0.14
inston
-0.14
armor
-0.13
POSITIVE LOGITS
bern
0.15
ystore
0.15
kánÃŃ
0.15
ÑĥÑĢи
0.14
ignum
0.14
latent
0.14
ê¸Ī
0.14
LETE
0.14
789
0.13
_js
0.13
Activations Density 0.111%