INDEX
    Explanations

    introductory phrases and conditional statements

    starts phrases with articles/pronouns

    It detects the start of the model/assistant's output (the beginning-of-sequence or initial tokens of a generated reply).

    New Auto-Interp
    Negative Logits
    -0.80
     snippetHide
    -0.80
    ſicht
    -0.79
     zwiſchen
    -0.79
     Dieſe
    -0.78
    <unused52>
    -0.77
    <unused68>
    -0.77
    <unused23>
    -0.77
    <unused17>
    -0.77
    <unused8>
    -0.77
    POSITIVE LOGITS
    The
    0.41
     dizem
    0.37
    You
    0.37
    A
    0.32
    At
    0.31
    It
    0.29
    Usually
    0.29
    Your
    0.29
     is
    0.28
    Just
    0.28
    Act Density 0.001%

    No Known Activations