INDEX
    Explanations

    instructions

    This neuron responds to the header labels “Instruction” (and its adjoining words like “and” and “Question”), i.e. it detects the formatted prompt‐instruction section.

    New Auto-Interp
    Negative Logits
    тап
    -0.07
     Од
    -0.06
     KY
    -0.06
    ้าย
    -0.06
    -0.06
     descended
    -0.06
    ]="
    -0.06
    (rec
    -0.06
    -0.06
     PyErr
    -0.06
    POSITIVE LOGITS
     Canary
    0.07
    expense
    0.06
    FORMATION
    0.06
    leo
    0.06
    anker
    0.06
    aira
    0.06
    .runners
    0.06
     grands
    0.06
     CDN
    0.06
     olumsuz
    0.06
    Act Density 0.013%

    No Known Activations