INDEX
Explanations
instances of user-related inputs or commands, often formatted in a specific way
LaTeX math symbols
LaTeX and markdown prefixes
the first word at the start of a new segment (immediately after the beginning-of-sequence), with weaker activation for other very-early tokens.
New Auto-Interp
Negative Logits
MessageTagHelper
-0.48
Савезне
-0.32
UnusedPrivate
-0.32
newBuilder
-0.30
HasColumnName
-0.28
tiéndose
-0.27
precum
-0.27
parsedMessage
-0.26
RTEE
-0.26
Földrajzportál
-0.26
POSITIVE LOGITS
zwiſchen
1.13
<unused41>
1.05
<unused14>
1.05
<pad>
1.04
<unused23>
1.04
<unused8>
1.04
<unused17>
1.04
<unused16>
1.04
[@BOS@]
1.04
<unused3>
1.04
Activations Density 1.341%