INDEX
Explanations
numerical values and their significance in a context
numbers and hyphens
New Auto-Interp
Negative Logits
y
-0.28
a
-0.27
bas
-0.27
p
-0.27
is
-0.27
by
-0.26
w
-0.26
↵
-0.25
ja
-0.24
g
-0.24
POSITIVE LOGITS
<unused41>
0.99
[@BOS@]
0.99
<unused79>
0.99
<unused17>
0.98
<unused28>
0.98
<unused14>
0.98
<unused42>
0.98
<unused47>
0.98
<unused43>
0.98
<unused3>
0.98
Activations Density 0.145%