INDEX
Explanations
structural elements and symbols in code
New Auto-Interp
Negative Logits
↵
-1.50
_
-0.85
-0.84
(
-0.80
<
-0.77
-0.76
-0.76
-0.71
-0.70
-0.68
POSITIVE LOGITS
<unused14>
2.27
<unused68>
2.27
<unused52>
2.27
<unused3>
2.25
<unused79>
2.25
[@BOS@]
2.25
<unused8>
2.25
<unused16>
2.25
<unused17>
2.23
<unused21>
2.23
Activations Density 0.522%