INDEX
Explanations
special characters and punctuation marks
New Auto-Interp
Negative Logits
393
-0.16
hiro
-0.15
orgen
-0.15
ulet
-0.14
qua
-0.14
é«
-0.14
hiba
-0.13
defer
-0.13
Deferred
-0.13
é±
-0.13
POSITIVE LOGITS
zin
0.17
Dün
0.16
(___
0.16
oce
0.15
(__
0.15
.__
0.15
(_)
0.15
Tunnel
0.14
rud
0.14
Purple
0.14
Activations Density 0.002%