INDEX
Explanations
function definitions and calls in code
`__` and starting tags
New Auto-Interp
Negative Logits
queſta
-0.99
ſicht
-0.91
<unused41>
-0.87
<unused79>
-0.86
<unused52>
-0.86
<unused8>
-0.86
<unused23>
-0.86
<unused47>
-0.86
<unused14>
-0.86
[@BOS@]
-0.86
POSITIVE LOGITS
(
0.87
__(
0.65
(
0.60
(
0.47
(\
0.41
$(
0.40
?(
0.40
if
0.38
left
0.38
((
0.38
Activations Density 0.002%