EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    The neuron is picking out slot‐machine jargon and feature headings (e.g. “slots,” “RTP,” “volatility,” “max win,” “bonus features,” etc.).
    o4-mini
    ride.This isn't a slot for small
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5583
    This neuron detects mentions of large language models and related training processes.
    o4-mini
    training:** The goal isn't to memorize the data
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5854
    the apostrophe character in English contractions.
    gpt-5
    training:** The goal isn't to memorize the data
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5854
    contractions and possessives marked by apostrophes.
    gpt-5
    ride.This isn't a slot for small
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5583
    explicit sexual content and vulgar sexual terms.
    gpt-5-mini
    of short videos that went viral·Assisted in
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 1654
    The neuron detects the main subject or primary technical noun of a question (the key topic word being asked about).
    gpt-5-mini
    ideas how to make an object keyframe it's self
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 10784
    This neuron detects user requests for ideas—words and phrases asking for app/startup/website/business ideas or suggestions.
    gpt-5-mini
    <|im_start|>assistantHere are some ideas for an app that
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 14207
    Detecting role‑play or instruction prompts that directly address the model (second‑person setup statements like "imagine/you are..." defining a role or task).
    gpt-5-mini
    userimagine you are project manager write a project
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 13865
    This neuron detects text that contains many grammatical and spelling errors or instructions asking for intentionally poor/bad grammar.
    gpt-5-mini
    icas y <|im_end|><|im_start|>assistant¡Hola amigos
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 16585
    specialized, domain-specific technical jargon and formal terminology, especially multi-syllabic scientific or medical terms and nomenclature.
    gpt-5
    locations in areas of spondylosis, which
    Neuronpedia logo
    QWEN2.5-7B-IT
    15-RESID-POST-AA
    INDEX 46357
    This neuron detects formatting and structural markup in the text (headings, emphasis/bold markers, section bullets and similar layout tokens).
    gpt-5-mini
    **Temperament:** Intelligent, eager to please
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 36853
    discussions centered on remote/hybrid work and return-to-office topics, including policies, practices, and collaboration for distributed teams.
    gpt-5
    work, future of work, hybrid work, remote work
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 14183
    Tokens that are part of the model/assistant's detailed explanatory responses (i.e., content words in the assistant's reply).
    gpt-5-mini
    pet stores. *Every dog owner who trims nails should
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 8413
    statements where the assistant refuses a request by citing safety rules, limits, or that it is "programmed" to be safe (i.e., refusal/safety-policy language).
    gpt-5-mini
    Safety Guidelines:** My core principles, as set by my
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2185
    The neuron is essentially flagging the assistant’s own “long‐form” explanation turns (the multi‐paragraph, bullet‐list responses) as opposed to user utterances. In other words, it turns on for tokens in the model’s detailed breakdowns.
    o4-mini
    widely available to the public. ↵↵Here's
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 1849
    assistant safety-refusal boilerplate: declarations that the AI cannot comply and references to its safety guidelines, ethical principles, and programming by its creators.
    gpt-5
    Safety Guidelines:** My core principles, as set by my
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2185
    discourse connectors and prepositional function words that signal relationships and structure within explanations or requests.
    gpt-5
    you're hoping for from my attendance?""
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 5126
    sentences or passages where the assistant introduces itself or describes its identity, training, capabilities, and availability.
    gpt-5-mini
    widely available to the public. ↵↵Here's
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 1849
    Tokens that occur in the model's long explanatory responses (assistant-generated, contentful reply text).
    gpt-5-mini
    In 2023, while other distributions chase
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 7045
    sentences where the assistant refers to itself and issues safety/refusal disclaimers (e.g., "I am programmed..." / "As such, I cannot...").
    gpt-5-mini
    helpful AI assistant. As such, I **cannot**
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2761