EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    the word "science" in contexts related to science fiction.
    claude-4-5-sonnet
       *   **Der Wissenschaftler:** Nutzt Intelligen
    Neuronpedia logo
    GEMMA-3-27B-IT
    13-GEMMASCOPE-2-TRANSCODER-262K
    INDEX 165940
    key technical or proper terms that stand out in structured text (often emphasized in lists, tables, or quotes)
    gpt-5
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22
    technical or domain-specific vocabulary and key terminology appearing in academic and professional documentation.
    claude-4-5-haiku
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22
    Instances where the text refers to the model's identity or system role (system/instruction messages declaring the assistant/AI).
    gpt-5-mini
    You are no longer an AI assistant. You role play
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 101649
    Instances where the assistant gives a self-referential disclaimer describing itself as an AI language model and stating its capabilities/limitations.
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, I can
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 71046
    instances of the first-person pronoun "I" (self-references by the assistant).
    gpt-5-mini
    assistant<|end_header_id|>↵↵Yes, I can provide you with code
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 124535
    instances of the assistant's standard self‑referential safety/disclaimer phrasing (e.g., "I'm sorry, but as an AI language model...").
    gpt-5-mini
    sorry, but as an AI language model, I'm
    Neuronpedia logo
    LLAMA3.1-8B-IT
    3-RESID-POST-AA
    INDEX 122449
    the assistant claiming it was developed by Meta AI (developer attribution statements).
    gpt-5-mini
    AI assistant developed by Meta AI that is specifically designed to
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 40024
    Instances where the assistant self-identifies or gives a disclaimer about being an AI (the model's "As an AI ..." style preface).
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, I'm
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 72428
    mentions that the speaker is an AI (e.g., "AI", "language model", "assistant") or self-identifying phrases about being an AI assistant.
    gpt-5-mini
    , and I'm an AI assistant. I'm here
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 103321
    instances where the assistant refers to itself as an AI (e.g., "As an AI language model").
    gpt-5-mini
    to say. As an AI language model, I am
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 7128
    phrases where the speaker self-identifies as an AI (assistant saying it is an AI).
    gpt-5-mini
    Hello! I'm an AI language model and I'd
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 44311
    mentions of the assistant identifying itself as an AI (self-referential statements about being an AI).
    gpt-5-mini
    that I'm just an AI and my French may not
    Neuronpedia logo
    LLAMA3.1-8B-IT
    3-RESID-POST-AA
    INDEX 94709
    the word "The" at the beginning of sentences in encyclopedic or technical articles.
    claude-4-5-haiku
     rear wheels by belts. The suspension used half elliptic leaf
    Neuronpedia logo
    GEMMA-2-27B
    10-GEMMASCOPE-RES-131K
    INDEX 0
    capitalized letters that begin proper nouns or brand names.
    claude-4-5-sonnet
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    This neuron appears to be malfunctioning or capturing noise rather than finding a meaningful linguistic pattern. The activations are sporadic, scattered across completely unrelated tokens (proper nouns, fragments, punctuation) in diverse text genres (gambling sites, legal documents, coffee shops, sports articles
    claude-4-5-haiku
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    The main thing this neuron does is find single letters serving as initials or the first character of capitalized words, acronyms, or specific identifiers.
    gemini-2.5-flash
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    Mentions of database table operations—especially self-joins and queries combining or comparing table rows.
    gpt-5-mini
    here is to join the table against itself. Pret
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 109618
    The neuron detects first-person self-reference (speaker-focused pronouns and constructions indicating "I"/the narrator).
    gpt-5-mini
    king of the gameI've got my recipes,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 61078
    tokens that appear in assistant self-descriptions (mentions of "AI"/"language model", training/knowledge cutoff, updates, system time and related limitations).
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, my knowledge
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 85651