INDEX
    Explanations

    information and questions

    The neuron fires on the assistant’s self-descriptive meta-language—phrases where it explains its role, capabilities, or guidelines (e.g. “my primary function is to provide…,” “as an AI language model,” etc.).

    New Auto-Interp
    Negative Logits
    jd
    -0.08
    /code
    -0.07
    igmat
    -0.06
    ليم
    -0.06
    ajes
    -0.06
    _CH
    -0.06
     opposes
    -0.06
     btc
    -0.06
     організ
    -0.06
    odel
    -0.06
    POSITIVE LOGITS
     dễ
    0.06
    으나
    0.06
     waste
    0.06
     финансов
    0.06
     Swap
    0.06
    _TXT
    0.06
    .snapshot
    0.06
    .drawString
    0.06
    PAY
    0.06
    relevant
    0.06
    Act Density 0.077%

    No Known Activations