INDEX
Explanations
high-frequency numerical values and specific syntactical structures in code or data
New Auto-Interp
Negative Logits
esco
-0.15
im
-0.15
mes
-0.14
.vm
-0.14
in
-0.14
ynes
-0.13
añ
-0.13
deaux
-0.13
imers
-0.13
261
-0.13
POSITIVE LOGITS
nist
0.16
dash
0.16
azen
0.15
ubi
0.14
dict
0.14
iba
0.14
ichten
0.14
agma
0.14
Dict
0.14
xon
0.14
Activations Density 0.032%