INDEX
Explanations
slashes or dividers in text
New Auto-Interp
Negative Logits
lijk
-0.17
alog
-0.17
ubat
-0.16
lag
-0.16
nox
-0.15
cky
-0.14
leta
-0.14
xiety
-0.14
bet
-0.14
ized
-0.14
POSITIVE LOGITS
Ë
0.18
SWG
0.16
ydk
0.15
νη
0.15
gle
0.14
ìĬµ
0.14
buz
0.14
Sentinel
0.14
iddy
0.14
453
0.13
Activations Density 0.022%