INDEX
Explanations
missing or incomplete states
New Auto-Interp
Negative Logits
i
0.38
ми
0.36
e
0.36
8
0.35
৬
0.35
the
0.34
iid
0.34
time
0.33
The
0.33
where
0.32
POSITIVE LOGITS
a
0.51
on
0.51
was
0.47
to
0.45
one
0.43
an
0.43
it
0.42
be
0.40
out
0.36
of
0.34
Activations Density 0.581%