INDEX
    Explanations

    Article previews

    The neuron activates almost exclusively on structural/control tokens (e.g. end-of-text or header markers), i.e. it is detecting metadata/chat formatting tokens rather than natural-language content.

    New Auto-Interp
    Negative Logits
    928
    -0.06
     enc
    -0.06
    Abs
    -0.06
    fb
    -0.06
     FormData
    -0.06
     SUS
    -0.06
     caption
    -0.06
     PUR
    -0.06
    _Reference
    -0.06
    tube
    -0.06
    POSITIVE LOGITS
    _HOUR
    0.07
     هفته
    0.07
     رسانه
    0.07
    0.07
    _escape
    0.07
     mantle
    0.06
    hong
    0.06
    .BorderSize
    0.06
    0.06
    camel
    0.06
    Act Density 0.008%

    No Known Activations