INDEX
    Explanations

    code and strings

    This neuron fires on the assistant’s self-referential disclaimers—particularly the “As an AI language model…” introduction and related statements of inability.

    instructions related to generating or modifying statements based on user input.

    New Auto-Interp
    Negative Logits
     Ax
    -0.07
     creepy
    -0.07
    RV
    -0.07
     Patient
    -0.07
     Cy
    -0.06
    "%
    -0.06
     Also
    -0.06
     sunglasses
    -0.06
     cy
    -0.06
    มเต
    -0.06
    POSITIVE LOGITS
    λά
    0.06
    enze
    0.06
    .accessToken
    0.06
    /routes
    0.06
    Options
    0.06
    леж
    0.06
     cancelButtonTitle
    0.06
    .getContent
    0.06
     aspirations
    0.06
    ي
    0.06
    Act Density 0.003%

    No Known Activations