INDEX
Explanations
instances of LaTeX formatting or figures in a document
New Auto-Interp
Negative Logits
å¼¾
-0.17
ambre
-0.16
itar
-0.15
italic
-0.14
à¤łà¤¨
-0.14
itches
-0.14
лÑıн
-0.14
amet
-0.14
elli
-0.13
PRS
-0.13
POSITIVE LOGITS
\
0.20
\
0.19
0.17
ovel
0.16
~↵
0.15
olvers
0.15
Tun
0.14
center
0.14
576
0.14
lever
0.14
Activations Density 0.017%