INDEX
Explanations
instances of the "<bos>" token which could denote the beginning of new content or sections in the text
New Auto-Interp
Negative Logits
)");
-0.97
)";
-0.93
']);
-0.88
}}$}
-0.88
"){
-0.88
."));
-0.87
`,
-0.84
')
-0.84
\"");
-0.84
"});
-0.84
POSITIVE LOGITS
#
2.12
#
1.83
\#
1.77
.#
1.68
#
1.63
\#
1.60
(#
1.57
:#
1.52
)#
1.43
('#1.42
Activations Density 0.194%