INDEX
Explanations
phrases that indicate purpose or intent
New Auto-Interp
Negative Logits
-1.16
ſind
-1.13
betweenstory
-1.09
sizeCache
-1.03
<unused43>
-1.03
<unused23>
-1.03
<unused28>
-1.02
<unused41>
-1.02
<unused14>
-1.02
[@BOS@]
-1.02
POSITIVE LOGITS
for
1.20
with
0.82
by
0.81
from
0.77
at
0.75
to
0.74
as
0.74
on
0.73
has
0.64
is
0.63
Activations Density 0.952%