INDEX
Explanations
numerical patterns or sequences
New Auto-Interp
Negative Logits
ConstraintMaker
-1.29
ſſung
-1.25
niſſe
-1.25
ſicht
-1.24
iſchen
-1.23
enablog
-1.19
ſei
-1.16
<unused68>
-1.16
bootstrapcdn
-1.16
<pad>
-1.16
POSITIVE LOGITS
↵↵
0.54
2
0.53
↵↵↵
0.48
1
0.48
↵↵↵↵
0.45
But
0.45
.
0.43
(
0.42
↵
0.41
I
0.40
Activations Density 0.018%