INDEX
    Explanations

    This neuron activates on numeric tokens specifying the requested word‐count in the prompt (e.g. the “200 0 words” figure).

    New Auto-Interp
    Negative Logits
     verified
    -0.08
     centerY
    -0.07
    ots
    -0.06
    von
    -0.06
     Ness
    -0.06
     questionable
    -0.06
     Bella
    -0.06
    .EOF
    -0.06
     eligible
    -0.06
     differentiation
    -0.06
    POSITIVE LOGITS
    “你
    0.06
    ourse
    0.06
    (proj
    0.06
     ustanov
    0.06
    firstname
    0.06
    paypal
    0.06
    ilmington
    0.06
     домов
    0.06
    Usuario
    0.06
    emploi
    0.06
    Act Density 0.006%

    No Known Activations