EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    key technical or proper terms that stand out in structured text (often emphasized in lists, tables, or quotes)
    gpt-5
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22
    technical or domain-specific vocabulary and key terminology appearing in academic and professional documentation.
    claude-4-5-haiku
    Magenta, Yellow, Key (black) | The
    Neuronpedia logo
    GPT-OSS-20B
    15-RESID-POST-AA
    INDEX 22
    Instances where the text refers to the model's identity or system role (system/instruction messages declaring the assistant/AI).
    gpt-5-mini
    You are no longer an AI assistant. You role play
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 101649
    Instances where the assistant gives a self-referential disclaimer describing itself as an AI language model and stating its capabilities/limitations.
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, I can
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 71046
    instances of the first-person pronoun "I" (self-references by the assistant).
    gpt-5-mini
    assistant<|end_header_id|>↵↵Yes, I can provide you with code
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 124535
    instances of the assistant's standard self‑referential safety/disclaimer phrasing (e.g., "I'm sorry, but as an AI language model...").
    gpt-5-mini
    sorry, but as an AI language model, I'm
    Neuronpedia logo
    LLAMA3.1-8B-IT
    3-RESID-POST-AA
    INDEX 122449
    the assistant claiming it was developed by Meta AI (developer attribution statements).
    gpt-5-mini
    AI assistant developed by Meta AI that is specifically designed to
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 40024
    Instances where the assistant self-identifies or gives a disclaimer about being an AI (the model's "As an AI ..." style preface).
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, I'm
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 72428
    mentions that the speaker is an AI (e.g., "AI", "language model", "assistant") or self-identifying phrases about being an AI assistant.
    gpt-5-mini
    , and I'm an AI assistant. I'm here
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 103321
    instances where the assistant refers to itself as an AI (e.g., "As an AI language model").
    gpt-5-mini
    to say. As an AI language model, I am
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 7128
    phrases where the speaker self-identifies as an AI (assistant saying it is an AI).
    gpt-5-mini
    Hello! I'm an AI language model and I'd
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 44311
    mentions of the assistant identifying itself as an AI (self-referential statements about being an AI).
    gpt-5-mini
    that I'm just an AI and my French may not
    Neuronpedia logo
    LLAMA3.1-8B-IT
    3-RESID-POST-AA
    INDEX 94709
    the word "The" at the beginning of sentences in encyclopedic or technical articles.
    claude-4-5-haiku
     rear wheels by belts. The suspension used half elliptic leaf
    Neuronpedia logo
    GEMMA-2-27B
    10-GEMMASCOPE-RES-131K
    INDEX 0
    capitalized letters that begin proper nouns or brand names.
    claude-4-5-sonnet
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    This neuron appears to be malfunctioning or capturing noise rather than finding a meaningful linguistic pattern. The activations are sporadic, scattered across completely unrelated tokens (proper nouns, fragments, punctuation) in diverse text genres (gambling sites, legal documents, coffee shops, sports articles
    claude-4-5-haiku
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    The main thing this neuron does is find single letters serving as initials or the first character of capitalized words, acronyms, or specific identifiers.
    gemini-2.5-flash
    vegas material and incorporate. Hottest news for their credit
    Neuronpedia logo
    GEMMA-3-12B
    12-GEMMASCOPE-2-RES-16K
    INDEX 0
    Mentions of database table operations—especially self-joins and queries combining or comparing table rows.
    gpt-5-mini
    here is to join the table against itself. Pret
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 109618
    The neuron detects first-person self-reference (speaker-focused pronouns and constructions indicating "I"/the narrator).
    gpt-5-mini
    king of the gameI've got my recipes,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 61078
    tokens that appear in assistant self-descriptions (mentions of "AI"/"language model", training/knowledge cutoff, updates, system time and related limitations).
    gpt-5-mini
    assistant<|end_header_id|>↵↵As an AI language model, my knowledge
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 85651
    This neuron detects first-person self-referential words and role/identity declarations (tokens like "I", "I'm", "am" and similar self-identifying phrases).
    gpt-5-mini
    their race or ethnicity. I am not a real entity
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 75732