INDEX
    Explanations

    code/web pages

    This neuron strongly activates on tokens from the system-instruction section—especially first-person (“I,” “will”), modal verbs, and surrounding punctuation—marking the model’s persona/setup prompt.

    New Auto-Interp
    Negative Logits
    جام
    -0.06
     languages
    -0.06
    -0.06
    Since
    -0.06
    Legacy
    -0.06
    .Contracts
    -0.06
    licted
    -0.06
     Recommend
    -0.06
    \Service
    -0.06
    .Attribute
    -0.06
    POSITIVE LOGITS
     Shut
    0.07
    enticator
    0.07
     cons
    0.06
     виб
    0.06
     #$
    0.06
    τής
    0.06
    IRON
    0.06
     söz
    0.06
    ABL
    0.06
     financially
    0.06
    Act Density 0.002%

    No Known Activations