EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    Mentions of feedback—especially feedback loops or iterative feedback mechanisms.
    gpt-5-mini
    izing algorithms or by incorporating feedback from external sources.↵2
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 28851
    This neuron detects document-level headings, titles and other structural/metadata elements (table-of-contents and section headings).
    gpt-5-mini
    ENTS↵↵↵THE BOOMING OF ACRE HILL
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 44904
    It detects technical chemical/chemical‑industry terminology—especially long chemical compound names and words about synthesis/production.
    gpt-5-mini
    oxepin-9-one involves the synthesis of the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 52414
    This neuron detects mentions of the concept "focus" — appearing as that token (including in brand/product names) or in phrases about focusing, feedback, or focus groups.
    gpt-5-mini
    our March 4th Focus Flight!↵.↵Seats are
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 104362
    This neuron detects speaker/role labels and dialogue-turn markers (tokens identifying who is speaking or header IDs).
    gpt-5-mini
    deal with.↵↵Person B: I'm sorry to hear
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 105095
    document-structure markers and formal section headings (e.g., Title, Abstract, Introduction, Section labels).
    gpt-5-mini
    recent citations, and how it impacts the estimated costs for
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 127785
    The neuron detects self-referential/reflexive language—tokens and phrases that refer back to the subject itself (e.g., "self", "itself", "talking to itself", "self-...").
    gpt-5-mini
    Government; literally the system talking to itself!↵↵While our
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 110626
    References to rises in intracellular calcium ([Ca2+]i) and related channel activity/activation (often tied to proteolytic activity or positive-feedback signaling) in biomedical text.
    gpt-5-mini
    proteolytic activity in a positive feedback loop, leading
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 77783
    Detects self-referential or meta discussion about storytelling/theatre (mentions of stories, plays, writing about writing, and meta-theatrical commentary).
    gpt-5-mini
    kind of meta-theater that questions the nature of art
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 51376
    It detects content-bearing verbs and descriptive adjectives—action or property words used in narratives.
    gpt-5-mini
    floating by and decided to embrace it.↵As she watched
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 27476
    the neuron's main role is to detect words about professional qualifications, experience, goals, motivations, and other job/role-related attributes.
    gpt-5-mini
    applicant about their goals and motivations. This will help you
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 57191
    comma-separated lists, item enumeration, and sequential parallel structures.
    claude-4-5-haiku
    bank’s involvement in 1MDB, described as
    Neuronpedia logo
    QWEN3-4B
    11-TRANSCODER-HP
    INDEX 139180
    words related to correctness, accuracy, or doing something right or wrong.
    claude-4-5-sonnet
    indicating how many problems are correctly answered and how many are
    Neuronpedia logo
    QWEN3-4B
    10-TRANSCODER-HP
    INDEX 121413
    statements evaluating correctness or accuracy of responses or decisions, including references to errors.
    gpt-5
    indicating how many problems are correctly answered and how many are
    Neuronpedia logo
    QWEN3-4B
    10-TRANSCODER-HP
    INDEX 121413
    statements about correctness and accuracy, including references to errors and evaluation of answer or task performance.
    gpt-5
    indicating how many problems are correctly answered and how many are
    Neuronpedia logo
    QWEN3-4B
    11-TRANSCODER-HP
    INDEX 82089
    references to performance evaluation—statements about correctness, incorrectness, error counts, and measured accuracy or success in tasks or tests.
    gpt-5
    the problems which are answered incorrectly and/or practice more.
    Neuronpedia logo
    QWEN3-4B
    9-TRANSCODER-HP
    INDEX 89857
    words and phrases indicating defeat, losing, or trailing behind in competitive contexts.
    claude-4-5-sonnet
    Last year's quarter-final loss to comp
    Neuronpedia logo
    QWEN3-4B
    9-TRANSCODER-HP
    INDEX 151143
    descriptions of unfavorable status or outcomes—being behind, losing, or otherwise negative—often in competitive or evaluative contexts.
    gpt-5
    Last year's quarter-final loss to comp
    Neuronpedia logo
    QWEN3-4B
    9-TRANSCODER-HP
    INDEX 151143
    words with negative connotations indicating problems, defects, wrongness, or undesirable qualities.
    claude-4-5-sonnet
    and allows the use of invalid certificates. When using the
    Neuronpedia logo
    QWEN3-4B
    11-TRANSCODER-HP
    INDEX 128789
    language signaling problems, faults, or negative conditions, often marked by negation or deficiency (e.g., invalidity, lack, inability, mis- states, failures, issues, or harms)
    gpt-5
    and allows the use of invalid certificates. When using the
    Neuronpedia logo
    QWEN3-4B
    11-TRANSCODER-HP
    INDEX 128789