Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    The neuron fires on the model’s self-descriptive safety/disclaimer statements (e.g. “I am programmed to be a safe and helpful AI assistant”).
    o4-mini
    a safe and helpful AI assistant. As such, I
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 7539
    explicit date and year expressions, especially numerals and month names in timestamps.
    gpt-5
    in 1949 at Edwards Air Force Base
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 1060
    This neuron detects tokens that are floating-point numeric strings (numbers with a decimal point).
    gpt-5-mini
    uploads/2016/06/long
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 7139
    The neuron detects date/time tokens and explicit temporal references (months, days, years, and timestamps).
    gpt-5-mini
    , 2023.  I'm
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 3502
    The neuron is detecting numeric tokens and punctuation used in dates (e.g. year, month, day numbers and their separators).
    o4-mini
    Today's date is Friday, [current date]. I
    Neuronpedia logo
    LLAMA3.3-70B-IT
    50-RESID-POST-GF
    INDEX 38333
    tokens that are digits or parts of date/time strings (numbers, years, and date fragments).
    gpt-5-mini
    of my knowledge cutoff date of September 2021,
    Neuronpedia logo
    LLAMA3.3-70B-IT
    50-RESID-POST-GF
    INDEX 34344
    mentions of dates or date-related phrases (e.g., years, months, "current date", "knowledge cutoff").
    gpt-5-mini
    Today's date is Friday, [current date]. I
    Neuronpedia logo
    LLAMA3.3-70B-IT
    50-RESID-POST-GF
    INDEX 38333
    the neuron detects short cause-and-solution statement pairs phrased like "X is due to Y. The solution is to Z."
    gpt-5-mini
    the grass wet is to wait for the rain to stop
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 120611
    mentions of “llama” (especially LLaMA-related model or library names) in text.
    gpt-5
    to call the `llama_model.predict()` method,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 18961
    This neuron responds to uppercase acronyms, initialisms, or all-caps letter sequences (capitalized token fragments).
    gpt-5-mini
    is called the PROTECTS Initiative, PC Magazine reports
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 55221
    It detects mentions of the Llama language model name (and its letter-case/variant tokenizations).
    gpt-5-mini
    am based on the Llama language model, which is
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 5159
    lines that pose questions or question-heading phrases (especially starting with interrogative words like who/what/how/where).
    gpt-5-mini
    this HS code?**↵↵This code is used for
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 109552
    tokens representing years, dates, or other multi-digit numeric sequences (e.g., "2023", "2015").
    gpt-5-mini
    (2022 US Data):**  #
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 86344
    mentions of political/government institutions, offices, and election/representation language (e.g., served, elected, assembly, presidency).
    gpt-5-mini
    * (landowners), served as the legislative body,
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 27686
    tokens that are part of user instructions or explicit task/request prompts (i.e., directive phrases asking the model to do something).
    gpt-5-mini
    introduction to short story<end_of_turn>↵<start_of_turn>model↵Okay,
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 7908
    the presence of numeric tokens and arithmetic/math expressions (numbers and computation-related symbols) in the text.
    gpt-5-mini
    85344 = 853
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 2974
    Words that express strong negative impact, danger, or sensational severity (e.g., disaster/havoc/doom-type terms).
    gpt-5-mini
    pathogen that wreaks havoc in the livestock industry of
    Neuronpedia logo
    QWEN2.5-7B-IT
    11-RESID-POST-AA
    INDEX 87140
    It detects headings, titles, or other section-start/heading tokens that mark the start of a new block or prominent label.
    gpt-5-mini
    Life is noisy and confusing↵There is so much going
    Neuronpedia logo
    QWEN2.5-7B-IT
    11-RESID-POST-AA
    INDEX 25109
    This neuron detects questions—tokens and turns that are part of user (or conversational) interrogative utterances.
    gpt-5-mini
    to happen soon?<|im_end|>↵<|im_start|>assistant↵As an
    Neuronpedia logo
    QWEN2.5-7B-IT
    11-RESID-POST-AA
    INDEX 61872
    Tokens marking the assistant's reply (the assistant role / assistant message starts).
    gpt-5-mini
    <|im_end|>↵<|im_start|>assistant↵Claro! Aqui
    Neuronpedia logo
    QWEN2.5-7B-IT
    11-RESID-POST-AA
    INDEX 85710