INDEX
Explanations
connective words that build lists or sequences
New Auto-Interp
Negative Logits
<unused41>
-1.59
myſelf
-1.58
<unused43>
-1.58
<unused74>
-1.58
<unused23>
-1.57
<unused14>
-1.57
<unused42>
-1.57
<unused51>
-1.57
[@BOS@]
-1.57
<unused8>
-1.57
POSITIVE LOGITS
1.48
,
1.31
.
1.30
1.21
↵↵
1.20
:
1.19
(
1.15
'
1.11
1.08
↵
1.04
Activations Density 1.278%