INDEX
Explanations
metadata and copyright related text in code
New Auto-Interp
Negative Logits
.
-2.91
.");
-2.61
.";
-2.56
.]
-2.52
.");
-2.50
.}
-2.45
.');
-2.45
.";
-2.44
.")
-2.36
.</
-2.34
POSITIVE LOGITS
<bos>
0.74
2
0.66
S
0.66
1
0.63
0.61
0.60
3
0.60
e
0.59
0.58
A
0.57
Activations Density 4.580%