INDEX
    Explanations

    instances of user-related inputs or commands, often formatted in a specific way

    LaTeX and markdown prefixes

    the first word at the start of a new segment (immediately after the beginning-of-sequence), with weaker activation for other very-early tokens.

    New Auto-Interp
    Negative Logits
    MessageTagHelper
    -0.48
     Савезне
    -0.32
    UnusedPrivate
    -0.32
     newBuilder
    -0.30
    HasColumnName
    -0.28
    tiéndose
    -0.27
     precum
    -0.27
    parsedMessage
    -0.26
    RTEE
    -0.26
    Földrajzportál
    -0.26
    POSITIVE LOGITS
     zwiſchen
    1.13
    <unused41>
    1.05
    <unused14>
    1.05
    <pad>
    1.04
    <unused23>
    1.04
    <unused8>
    1.04
    <unused17>
    1.04
    <unused16>
    1.04
    [@BOS@]
    1.04
    <unused3>
    1.04
    Act Density 1.341%

    No Known Activations