INDEX
Explanations
the start of a new section within the document
Followed by a question mark
New Auto-Interp
Negative Logits
++
-1.23
}}$}
-1.18
>\<^
-1.09
\<^
-1.09
)");
-1.07
}.
-1.07
".
-1.06
)";
-1.05
$_"
-1.03
$.
-1.03
POSITIVE LOGITS
<eos>
0.94
https
0.94
↵↵
0.94
↵
0.86
http
0.78
I
0.74
"
0.72
↵↵↵↵
0.71
0.70
“
0.70
Activations Density 0.123%