INDEX
Explanations
mentions of programming constructs and error messages
New Auto-Interp
Negative Logits
”—
-0.95
`;
-0.85
<strong>
-0.84
<eos>
-0.83
.’”
-0.83
.”.
-0.83
)”.
-0.81
”.
-0.80
—”
-0.78
;”
-0.77
POSITIVE LOGITS
بيها
0.77
we
0.71
TODO
0.71
stuff
0.70
ourselves
0.68
'
0.67
*/
0.65
こっち
0.64
ppl
0.64
yg
0.63
Activations Density 0.758%