INDEX
Explanations
the beginning of new sections or topics in the text
New Auto-Interp
Negative Logits
surla
-0.98
niſſe
-0.97
queſta
-0.96
iſen
-0.90
mpagne
-0.90
ValueStyle
-0.90
iffance
-0.88
ſicht
-0.86
ſcher
-0.86
<unused14>
-0.85
POSITIVE LOGITS
<td>
0.37
But
0.36
<sub>
0.34
I
0.33
Q
0.33
<em>
0.32
<strong>
0.31
<code>
0.31
expect
0.31
2
0.31
Activations Density 0.065%