INDEX
Explanations
various formatting elements and special characters in text
New Auto-Interp
Negative Logits
-0.39
-0.34
/
-0.34
,
-0.32
The
-0.32
A
-0.32
↵
-0.31
.
-0.30
once
-0.30
2
-0.29
POSITIVE LOGITS
surla
0.96
<unused28>
0.96
<unused43>
0.96
<unused41>
0.96
<unused51>
0.96
<unused52>
0.96
[@BOS@]
0.96
<unused79>
0.96
<unused23>
0.96
<unused17>
0.96
Activations Density 0.004%