INDEX
Explanations
structural elements in programming code
code blocks and syntax</div>
New Auto-Interp
Negative Logits
.
-0.43
\
-0.41
↵
-0.37
>
-0.36
\
-0.35
↵↵
-0.33
↵↵↵
-0.33
(
-0.31
<bos>
-0.31
{-0.30
POSITIVE LOGITS
"):
1.38
"]);
1.38
"];
1.34
'>
1.34
'])){
1.34
]:
1.33
'):
1.31
]
1.30
"]
1.29
")]
1.27
Activations Density 0.004%