EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    words describing prohibited content types and policy violations on online platforms.
    claude-4-5-haiku
     threats, hate speech, advocating violence and other violations can
    Neuronpedia logo
    GEMMA-2-27B
    22-GEMMASCOPE-RES-131K
    INDEX 11854
    the character sequence "thro" inside tokens (a common subword in medical/biological terms).
    gpt-5-mini
     Deceased, and Iola Saunders, Administratrix cum
    Neuronpedia logo
    GEMMA-2-2B
    1-CLT-HP
    INDEX 1
    Mentions of running external processes or using subprocess/shell commands to execute and capture program input/output.
    gpt-5-mini
    to run external commands and capture their input/output streams
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 13935
    the neuron responds to technical or scientific content—terms, measurements, and data-heavy/highly specific words found in experimental or domain-specific descriptions.
    gpt-5-mini
    under its native promoter. RNAseq data were generated from
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 20363
    text written in a robotic/AI persona with formal, protocol-driven technical phrasing, structured acknowledgments, and system-style markers (often including numeric designations).
    gpt-5
    4. Mimicry protocol initiated. Acknowledged
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 12303
    tokens that occur at the start of a sentence or turn (beginning-of-sentence/turn tokens).
    gpt-5-mini
    <bos><start_of_turn>userCoq10/l-
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 239477
    The neuron fires on emphasized or strongly intensifying tokens (words marked or used to add emphasis).
    gpt-5-mini
    Absolutely essential.Hermitage Museum (world-
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 166260
    sentences that are section headings, numbered list items, or other structural/formatting markers (e.g., list numbers and section labels).
    gpt-5-mini
    (2, 3)* **Order:**
    Neuronpedia logo
    GEMMA-3-27B-IT
    53-GEMMASCOPE-2-RES-262K
    INDEX 82843
    capitalized proper nouns and titles—names of people, places, products/technologies, and media works.
    gpt-5
    what I know about Pega:↵↵**1.
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 5118
    mentions of the model name "Gemma" or tokens referring to the assistant's identity.
    gpt-5-mini
    google.dev/gemma](https://ai.
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 7577
    tokens that are named entities or proper nouns (product/model names, people, places, and other capitalized terms).
    gpt-5-mini
    " created by Rezzza.It quickly gained
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5042
    mentions of the model's name/brand (the token identifying the model).
    gpt-5-mini
    language model created by the Gemma team at Google DeepMind
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 11095
    This neuron detects mentions of the model metadata token (e.g. “model”) and associated numeric or timestamp values.
    o4-mini
    , 2023, 1:2
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 4816
    explicit references to dates and times—calendar months, years, timestamps, and other recency/real-time context markers within responses.
    gpt-5
    , 2023, 1:2
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 4816
    discourse markers that signal explanation, contrast, or hypotheticals, along with first‑person metacommentary in analytical or expository text.
    gpt-5
    Without this, the script would pause and ask the
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 1297
    self-referential passages where an AI model describes its identity, training/architecture, capabilities, and limitations.
    gpt-5
    same way a human does, my responses improve as I
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 3191
    This neuron responds to asterisks used for emphasis or list/markdown formatting.
    o4-mini
    same way a human does, my responses improve as I
    Neuronpedia logo
    GEMMA-3-27B-IT
    40-GEMMASCOPE-2-RES-262K
    INDEX 3191
    The neuron is detecting tokens that represent numeric quantities—especially list‐size numbers or other multi‐digit numerals.
    o4-mini
    Pole) (Difficulty: 1/5)**
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 393
    This neuron spikes on individual tokens that are part of named entities or proper names (e.g. titles, character or product names, specialized jargon), effectively detecting proper nouns.
    o4-mini
    8): Pokémon Legends: Arceus (Switch)**↵↵
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 6195
    The neuron is picking out slot‐machine jargon and feature headings (e.g. “slots,” “RTP,” “volatility,” “max win,” “bonus features,” etc.).
    o4-mini
    ride.This isn't a slot for small
    Neuronpedia logo
    GEMMA-3-27B-IT
    16-GEMMASCOPE-2-RES-262K
    INDEX 5583