INDEX
Explanations
various instances of commas and quotations, indicating a focus on textual punctuation
New Auto-Interp
Negative Logits
utow
-0.17
OTES
-0.16
ekim
-0.16
serter
-0.15
ordes
-0.15
iente
-0.15
OR
-0.15
Jac
-0.14
italic
-0.14
ometr
-0.14
POSITIVE LOGITS
adir
0.19
ib
0.18
Estr
0.17
ext
0.16
ullo
0.16
Ext
0.16
anza
0.16
Gang
0.15
Nut
0.15
Pur
0.15
Activations Density 0.034%