INDEX
    Explanations

    The neuron activates on the special control tokens that mark conversation structure (e.g. “<|start_header_id|>”, “<|end_header_id|>”, speaker tags, and other header/footer markers).

    New Auto-Interp
    Negative Logits
    (cls
    -0.07
    [offset
    -0.07
    Triple
    -0.07
    Ě
    -0.07
    уп
    -0.06
    lopen
    -0.06
    _game
    -0.06
    (conf
    -0.06
    Authorities
    -0.06
     LIMIT
    -0.06
    POSITIVE LOGITS
     smlou
    0.07
    0.06
    sen
    0.06
    /place
    0.06
    0.06
    -US
    0.06
     and
    0.06
    uteur
    0.06
    HashCode
    0.06
     nást
    0.06
    Act Density 0.096%

    No Known Activations