INDEX
Explanations
instances of punctuation or formatting marks in text
New Auto-Interp
Negative Logits
arakter
-0.15
å½¹
-0.14
msp
-0.14
/*č↵
-0.13
.TabStop
-0.13
leich
-0.13
mise
-0.12
ãĤ¤ãĥī
-0.12
contres
-0.12
-↵↵
-0.12
POSITIVE LOGITS
is
0.20
has
0.20
are
0.17
can
0.16
will
0.16
was
0.15
ling
0.15
cannot
0.15
could
0.15
must
0.15
Activations Density 0.149%