Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    It detects self-referential statements where the model talks about itself (first-person identity, creation, capabilities, or "my"/"I" statements).
    gpt-5-mini
    ):**  The body is incredibly efficient. When it
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 9135
    The neuron detects emphatic or evaluative modifier words—strong adjectives and adverbs like “very,” “important,” “positive,” “negative,” etc.
    o4-mini
    - Accept a challenge with a positive attitude↵  T
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 7345
    structured, list-style formatting—especially numbered items with bolded headings, product/brand names, acronyms, and section dividers.
    gpt-5
    voices, not the most luxurious feel.↵    *
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 813
    This neuron activates on copular or auxiliary “to be” verbs (is/are/will be/etc.), flagging statements that define or assert something.
    o4-mini
    . Automated builds and tests are run to detect integration issues
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 424
    tokens containing the letter “Z” (upper- or lowercase), especially when it appears at the start of names or terms.
    gpt-5
    stability.↵* **Zephyr Holt:** "Ze
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 1339
    The neuron fires strongly on genre labels—especially darker ones—such as “Horror,” “Mystery,” “dark,” or “scary.”
    o4-mini
    , Mystery, Romance, Horror, Slice of Life,
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 3759
    statements and headings that frame structured analysis or troubleshooting, signaling problem identification, core issues, challenges, breakdowns, and considerations.
    gpt-5
    **1. The Core Problem: Copyright and Removal**
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 433
    Sections that present a problem-analysis and solution/troubleshooting advice (headings like “The Core Problem,” reasons, and what to do).
    gpt-5-mini
    **1. The Core Problem: Copyright and Removal**
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 433
    This neuron spots words and phrases that introduce or label problems—like “issue,” “breakdown,” “core problem,” or other signals that a difficulty is being explained.
    o4-mini
    **1. The Core Problem: Copyright and Removal**
    Neuronpedia logo
    GEMMA-3-4B-IT
    22-GEMMASCOPE-2-RES-16K
    INDEX 433
    strongly negative, complaint-style review language indicating dissatisfaction with a product, service, or experience.
    gpt-5
     Kitchen. Not recommended at all. Lethargic service
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 9764
    first-person, autobiographical statements expressing personal experience, thoughts, or preferences within informal explanations or advice.
    gpt-5
     size smalls. We used disposables about half the
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 14915
    critical evaluations of media that call out contrivance or unrealistic, overly neat/predictable elements, often marked by intensifiers and evaluative qualifiers.
    gpt-5
    , a dark (sometimes ludicrously so) crime saga
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 15153
    humorous, tongue-in-cheek content and references to comedy, including jokes, quips, puns, parody, and self-deprecating roasts.
    gpt-5
     columnist, tongue firmly in cheek. If you recall,
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 8205
    finance-related terminology, especially around lending, underwriting, and credit risk/scoring.
    gpt-5
    by which lenders assess the creditworthiness of a borrower and
    Neuronpedia logo
    GPT-OSS-20B
    11-RESID-POST-AA
    INDEX 117529
    statements expressing uncertainty or lack of knowledge, such as noting information is unknown, unknowable, unclear, scarce, or not readily ascertainable.
    gpt-5
     group was really trained is lost or has<end_of_turn>↵
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 1527
    first-person, conversational answer passages in interview-style text where the speaker explains, opines, or describes their process and experiences.
    gpt-5
    normal’. Basically it’s the story of Caroline’
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 5210
    mentions of philosophy and philosophical inquiry, including discourse about studying, teaching, or discussing it.
    gpt-5
    ated↵↵Whenever I talk philosophy with a “freed slave
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 15189
    formal disclaimer and limitation language that negates capabilities, promises, rights, or responsibilities (e.g., “will not,” “does not,” “only,” “cannot”)
    gpt-5
     Shelves app. It will not erase your previous collection.
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 9128
    technical or metadata-like tokens—acronyms, formal category/league labels, file/module paths, dates, and numerals typical of encyclopedic, legal, or programming text
    gpt-5
    Though brain is a very small part of the body,
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 3569
    first-person, in-the-moment narration that emphasizes immediate time and place (live reporting or present-tense diary-style updates).
    gpt-5
     dog pen. I got here, along<end_of_turn>↵
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-16K
    INDEX 14516