EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    mentions of genetic variants and molecular biology assay components within technical scientific contexts.
    gpt-5
    -antitrypsin gene probe. In eight patients with
    Neuronpedia logo
    GEMMA-2-2B
    0-CLT-HP
    INDEX 478
    mentions of CEOs and executive leadership in corporate news or forward-looking business statements.
    gpt-5
    , and other forward-looking statements in these remarks,
    Neuronpedia logo
    GEMMA-2-2B
    0-CLT-HP
    INDEX 437
    The neuron detects capitalized named-entity tokens (proper nouns like people, places, teams, organizations).
    gpt-5-mini
    at St James' Park, after which he said:
    Neuronpedia logo
    GPT2-SMALL
    8-RES-JB
    INDEX 55
    commas that mark clause breaks or parenthetical/afterthought phrases within a sentence.
    gpt-5
    at St James' Park, after which he said:
    Neuronpedia logo
    GPT2-SMALL
    8-RES-JB
    INDEX 55
    The neuron detects language describing the formation or execution of habits and routine behaviors.
    o4-mini
    behaviors, these behaviors often become habitual and so routine that
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 99130
    This neuron responds to list enumeration markers, in particular digits (and their trailing punctuation) that denote numbered list items.
    o4-mini
    Technical breakout10. Economic recovery↵↵Events before a
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 59638
    This neuron responds to dynamic movement verbs—especially those describing traversing through an environment (e.g. “fly through the waters”).
    o4-mini
    eat with them or fly through the waters like one of
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 59638
    the start of an assistant's reply or the beginning of a structured assistant response (tokens marking the assistant role/response).
    gpt-5-mini
    it?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵There could be several
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 101129
    This neuron activates on topical content words—concrete nouns and domain-specific terms (e.g., objects, places, activities, and technical or subject-specific vocabulary).
    gpt-5-mini
    i dont want to have people in my painting<|eot_id|><|start_header_id|>
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 123331
    It detects the start of assistant replies / tokens marking an assistant speaker turn in the conversation.
    gpt-5-mini
    you<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Hello! I'm Assistant
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 52882
    Tokens at the start of an assistant-generated reply (the boundary/marker indicating a model/assistant response).
    gpt-5-mini
    ?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵There are several cem
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 122494
    technical or domain-specific terminology (jargon) — i.e., tokens from technical descriptions, code, networking, or scientific/patent language.
    gpt-5-mini
    appropriate for your internal network, but make sure it is
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 31547
    spots meta-commentary about repetition or rehashing of arguments or topics (mentions that something is being repeated or raised again).
    gpt-5-mini
    rolled his eyes. Not this again. The anti-A
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 122273
    This neuron detects conversational greetings and salutations (tokens used to say “hello” or open a conversation).
    gpt-5-mini
    <|start_header_id|>assistant<|end_header_id|>↵↵Hello! Is there something specific you
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 92688
    instances where the text signals prior occurrence or familiarity (that something or someone has been experienced or seen before).
    gpt-5-mini
    , regardless of whether it has to do with sex,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 30532
    the neuron responds to repeated or boilerplate text—tokens that appear many times in a duplicated/templated phrase or repeatedly reiterated sentence fragments.
    gpt-5-mini
    close to being fully grown, but he just needed a
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 31284
    This neuron detects self-referential mentions of the assistant’s identity (e.g. “AI language model”).
    o4-mini
    assistant<|end_header_id|>↵↵As an AI language model, my knowledge
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 85651
    the word "science" in contexts related to science fiction.
    claude-4-5-sonnet
       *   **Der Wissenschaftler:** Nutzt Intelligen
    Neuronpedia logo
    GEMMA-3-27B-IT
    13-GEMMASCOPE-2-TRANSCODER-262K
    INDEX 165940
    key technical or proper terms that stand out in structured text (often emphasized in lists, tables, or quotes)
    gpt-5
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22
    technical or domain-specific vocabulary and key terminology appearing in academic and professional documentation.
    claude-4-5-haiku
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22