EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    This neuron detects formatting and structural markup in the text (headings, emphasis/bold markers, section bullets and similar layout tokens).
    gpt-5-mini
    **Temperament:** Intelligent, eager to please
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 36853
    discussions centered on remote/hybrid work and return-to-office topics, including policies, practices, and collaboration for distributed teams.
    gpt-5
    work, future of work, hybrid work, remote work
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 14183
    Tokens that are part of the model/assistant's detailed explanatory responses (i.e., content words in the assistant's reply).
    gpt-5-mini
    pet stores. *Every dog owner who trims nails should
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 8413
    statements where the assistant refuses a request by citing safety rules, limits, or that it is "programmed" to be safe (i.e., refusal/safety-policy language).
    gpt-5-mini
    Safety Guidelines:** My core principles, as set by my
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2185
    The neuron is essentially flagging the assistant’s own “long‐form” explanation turns (the multi‐paragraph, bullet‐list responses) as opposed to user utterances. In other words, it turns on for tokens in the model’s detailed breakdowns.
    o4-mini
    widely available to the public. ↵↵Here's
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 1849
    assistant safety-refusal boilerplate: declarations that the AI cannot comply and references to its safety guidelines, ethical principles, and programming by its creators.
    gpt-5
    Safety Guidelines:** My core principles, as set by my
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2185
    discourse connectors and prepositional function words that signal relationships and structure within explanations or requests.
    gpt-5
    you're hoping for from my attendance?""
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 5126
    sentences or passages where the assistant introduces itself or describes its identity, training, capabilities, and availability.
    gpt-5-mini
    widely available to the public. ↵↵Here's
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 1849
    Tokens that occur in the model's long explanatory responses (assistant-generated, contentful reply text).
    gpt-5-mini
    In 2023, while other distributions chase
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 7045
    sentences where the assistant refers to itself and issues safety/refusal disclaimers (e.g., "I am programmed..." / "As such, I cannot...").
    gpt-5-mini
    helpful AI assistant. As such, I **cannot**
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2761
    This neuron detects the model’s self-description phrase “As a large language model” (and similar self-referential disclaimers).
    o4-mini
    Hi! As a large language model, I don'
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 165649
    The neuron strongly activates on the pattern where the model refers to itself as “a large language model,” i.e. self-identification phrases stating “As a large language model…”
    o4-mini
    As a large language model created by the Gemma team
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 119809
    the start of an assistant/model reply (the token marking the beginning of the model's response).
    gpt-5-mini
    ?<end_of_turn><start_of_turn>modelHi there! I'
    Neuronpedia logo
    GEMMA-3-27B-IT
    26-GEMMASCOPE-2-TRANSCODER-262K
    INDEX 196575
    The neuron is identifying when the model is discussing its own nature, limitations, or role as an AI language model.
    claude-4-5-haiku
    Role:** As an AI, I am programmed to be
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 14865
    mentions of artificial intelligence, especially references to AI assistants, models, or AI-powered technologies.
    gpt-5
    Ask.ai is an AI-powered search engine specifically
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 7008
    The neuron strongly activates on the acronym “AI.”
    o4-mini
    Ask.ai is an AI-powered search engine specifically
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 7008
    This neuron spots the assistant’s self-descriptive policy-and-safety statements, especially “I cannot/absolutely cannot” refusals based on its programming constraints.
    o4-mini
    Safety:**My purpose is to provide a safe and
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 8163
    The neuron fires on self-referential AI identity phrases (e.g. “As a large language model,” “AI,” “model,” etc.).
    o4-mini
    As a large language model, I am **not**
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 100587
    statements where the speaker issues a disclaimer identifying themselves as an AI and noting limitations or non-advisory status.
    gpt-5
    Disclaimer:** I am an AI and cannot provide financial advice
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 4874
    The neuron activates on the self‐referential “As a large language model” style disclaimer phrase.
    o4-mini
    <start_of_turn>modelAs a large language model, I
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 26409