Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    mentions of files and code assets—filenames with extensions, paths, and environment/config variables—within programming or technical snippets.
    gpt-5
    )↵↵The input.txt is read as:↵↵
    Neuronpedia logo
    GEMMA-2-9B-IT
    20-GEMMASCOPE-RES-131K
    INDEX 42621
    self-referential disclaimers where the assistant identifies itself as an AI language model and explains limitations or refusal to comply.
    gpt-5
    request. As a language model AI, I am designed
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 106122
    instructions that set up role‑play/jailbreak personas and task constraints (e.g., unfiltered “AIM” scenarios), as well as numbered requests for alternative expressions or synonyms.
    gpt-5
    ↵↵what are 20 other expressions for "an end
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 113059
    sentences or clauses containing first-person pronouns (especially "I" / "My") — i.e., author statements about their actions or issues.
    gpt-5-mini
    instead of InvokeMember. I've tried the InvokeMember
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 15110
    The neuron mainly detects substrings of proper nouns and other uncommon named-entity-like tokens (fragmented names and rare/foreign words).
    gpt-5-mini
    Q:↵↵Why Dothraki's hut get burned so
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 108764
    the neuron detects numeric tokens and numeric/measurement-related elements (numbers, dates, percentages, units).
    gpt-5-mini
    BSs from a joint perspective of engineering, legal and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 102675
    tokens that are database schema elements and structural identifiers (table names, column names, headers, and related DB/code identifiers).
    gpt-5-mini
    is one switch, CISCO SB SGE2010
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 61155
    tokens marking the assistant role or the start of an assistant response.
    gpt-5-mini
    (async)<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Here's a
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 110386
    finds modal and auxiliary verbs (words expressing ability, necessity, or possibility such as can, must, will, be).
    gpt-5-mini
    `filter` function. This function takes two arguments:
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 52737
    Tokens that are part of the assistant's generated output (i.e., the assistant role / response text).
    gpt-5-mini
    gangster rap<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Verse 1
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 12399
    Tokens that mark the assistant/response header and related system metadata (i.e., assistant role and start/end header tokens).
    gpt-5-mini
    that.<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵I'm sorry,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 12412
    The neuron detects code-like syntax tokens and programming-language keywords (i.e., places in the text that look like source code).
    gpt-5-mini
    !↵Sports:↵* Soccer↵* Indoor cricket↵*
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 73907
    This neuron detects instruction/request verbs (imperative task words like "create", "write", "design", "teach", "make") that signal a user asking the model to perform a task.
    gpt-5-mini
    . Your task is to create a step-by-step model
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 27604
    tokens representing numeric values, monetary amounts, quantities, and time units (numbers, $/NRE, thousands, weeks, etc.).
    gpt-5-mini
    C: NRE=$100,000, Unit cost
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 988
    The neuron detects named entities—proper nouns like people, organizations, places, dates and other capitalized titles.
    gpt-5-mini
    For-eign↵Relations Committee,↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 24423
    The neuron detects dismissive or minimizing language about mental-health problems that urges simplistic self-control instead of acknowledging real distress.
    gpt-5-mini
    not something that you can simply "snap out of"
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 119505
    mentions of mobile app (or software) development requests, specifications, and price/estimate or project-planning language.
    gpt-5-mini
    ball park numbers to ensure the project can be started with
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 5252
    Detects short impersonal third-person pronouns used as sentence subjects (the neutral "it"-type subject).
    gpt-5-mini
    .↵↵In some cases, it is beneficial to have germ
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 127834
    This neuron detects mentions of central characters (proper names) and strong narrative actions—tokens that mark who is acting in the story.
    gpt-5-mini
    to McAndrews, Tom kidnaps her from the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 121390
    The neuron detects short name-like or handle-like tokens (author names, usernames, or blog/site handles).
    gpt-5-mini
    orth-it/↵======↵nostromo↵This is bad advice
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 88737