EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    numbers and numerical/quantitative information such as statistics, metrics, dates, and monetary values.
    claude-4-5-haiku
    334 (Thirty-Four Thousand, Three Hundred
    Neuronpedia logo
    GEMMA-3-27B
    53-GEMMASCOPE-2-RES-65K
    INDEX 1852
    mild, hedged negative evaluations that point out minor drawbacks or limitations, often using qualifying language or “on the … side” constructions.
    gpt-5
    their time-management skills, as they have a tendency to
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 114664
    statements conveying a middling, neutral appraisal in reviews—indicating something is acceptable or average rather than excellent or poor.
    gpt-5
    okay student in high school, and now he's struggling
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 6076
    negative evaluative commentary in reviews, especially statements highlighting a work’s flaws, weaknesses, or lack of quality
    gpt-5
    The problem I had is that the most interesting parts of
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 58280
    expressions of dissatisfaction in consumer reviews, especially complaints about poor quality, bad fit, defects, or unmet expectations.
    gpt-5
    because the style is great. Too bad about the execution
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 9301
    strongly negative evaluative language indicating criticism, dissatisfaction, or derision, especially in reviews and complaints.
    gpt-5
    of a deceased title that should have stayed dead. Without
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 92200
    mathematical expressions and calculations involving variables and numerical operations.
    claude-4-5-haiku
    13 = 3. And 11 boys would give
    Neuronpedia logo
    DEEPSEEK-R1-DISTILL-LLAMA-8B
    17-LLAMASCOPE-SLIMPJ-OPENR1-RES-32K
    INDEX 23818
    mathematical expressions and equations containing variables and numerical operations.
    claude-4-5-haiku
    ). Again, the neighbors of 3 can be
    Neuronpedia logo
    DEEPSEEK-R1-DISTILL-LLAMA-8B
    21-LLAMASCOPE-SLIMPJ-OPENR1-RES-32K
    INDEX 29518
    domain-specific technical or formal nouns that name systems, processes, fields, measurements, or key entities in a topic (e.g., networks, data, instruments)
    gpt-5
    This is a type of intensity which is not the property
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 35399
    phrases indicating someone assuming control or a role/position, especially “take over as” style leadership or responsibility transitions.
    gpt-5
    On:↵↵Ryan will take over full-time GM duties↵↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 9301
    mentions of a specific proper-noun brand/company name (a named brand reference).
    gpt-5
    , so don’t let Giant’s marketing people put you
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 91616
    mentions of entertainment awards and honors, including nominations, wins, categories, and award-show contexts.
    gpt-5
    including the Genie Awards, The Gemini Awards, The
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 54690
    mentions of the legal term “certiorari” in U.S. court case citations or headings.
    gpt-5
    11th Cir. Certiorari denied.↵↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 118424
    snippets of source code (programming tokens and identifiers).
    gpt-5-mini
    keep_fnames: false,↵ mangle:
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 6662
    tokens that are parts of systematic chemical compound names (IUPAC-style fragments, numbers and hyphenated segments).
    gpt-5-mini
    ographic Chemicals3-Chloro-6-
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 13279
    phrases where the speaker asks for help identifying something (first‑person requests/questions like "can anyone help identify this" or "what is this").
    gpt-5-mini
    my experience (She says it is a cactus but
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 109563
    the neuron detects interrogative/question cues — tokens that start or appear in questions (question words and auxiliaries used to form questions).
    gpt-5-mini
    Which time period would you choose and why?"<|eot_id|><|start_header_id|>
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 8655
    tokens that are part of markup/structural document tags (HTML/XML‑style tags and other structural delimiters).
    gpt-5-mini
    media="print" />↵ <script type="text
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 52056
    This neuron detects first-person self-referential pronouns and tokens (e.g., "I", "me", "my", and equivalents in other languages).
    gpt-5-mini
    gosto de muitas coisas, desde a po
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 123529
    questions asking about the assistant's personal attributes or identity (age, location, appearance, name).
    gpt-5-mini
    can you describe what you look like?<|eot_id|><|start_header_id|>assistant
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 77487