INDEX
Explanations
punctuation marks and sentence endings
New Auto-Interp
Negative Logits
<unused43>
-1.11
<unused41>
-1.11
<unused23>
-1.10
<unused74>
-1.10
<unused14>
-1.10
<unused8>
-1.10
<unused16>
-1.09
<unused28>
-1.09
<unused3>
-1.09
[@BOS@]
-1.09
POSITIVE LOGITS
https
0.94
http
0.89
...
0.82
Read
0.76
<eos>
0.74
...
0.73
This
0.69
https
0.69
The
0.69
…
0.66
Activations Density 0.683%