INDEX
    Explanations

    This neuron activates on tokens in parenthetical length‐limit instructions—especially on the “no more than X words” framing (e.g., the “(no … words)” constraint).

    New Auto-Interp
    Negative Logits
    학기
    -0.06
    icing
    -0.06
    .vert
    -0.06
     shown
    -0.06
     сентября
    -0.06
    Hola
    -0.06
     Anthony
    -0.06
    826
    -0.06
     dealt
    -0.06
    ilde
    -0.05
    POSITIVE LOGITS
    ledged
    0.07
    Applied
    0.07
    UTDOWN
    0.07
     작업
    0.07
    pes
    0.07
     refresh
    0.06
    Completed
    0.06
    (domain
    0.06
    chio
    0.06
    ğu
    0.06
    Act Density 0.021%

    No Known Activations