INDEX
Explanations
empty quotation marks and syntax-related characters
New Auto-Interp
Negative Logits
queſta
-1.24
niſſe
-1.21
<pad>
-1.20
[@BOS@]
-1.20
<unused68>
-1.20
iſchen
-1.20
<unused43>
-1.20
<unused41>
-1.20
<unused14>
-1.20
<unused28>
-1.20
POSITIVE LOGITS
<eos>
0.49
0.48
↵
0.40
↵↵
0.40
1
0.37
.
0.35
2
0.31
I
0.30
↵↵↵
0.30
0.30
Activations Density 0.000%