EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    It detects the assistant's safety/refusal boilerplate — statements explaining it’s programmed to be safe and why it won’t comply with harmful or disallowed requests.
    gpt-5-mini
    helpful AI assistant. As such, I **absolutely cannot
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 2686
    medical and clinical terms — disease names, organs, symptoms, and related health/medical vocabulary.
    gpt-5-mini
    body produce thick and sticky mucus. This mucus clogs
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 3465
    sentences where the speaker identifies or describes the AI/model (self‑introductions and metadata like "I am Gemma", "a large language model", "open-weights", model type/version).
    gpt-5-mini
    ** Vicuna is a large language model (LLM
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 7262
    Finds tokens that name roles, entities, organizations, or technical/ideological labels.
    gpt-5-mini
    , the far-right libertarian, was inaugurated as President
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 6461
    utterances where the assistant offers help or says it can/ will assist (first‑person offers of assistance).
    gpt-5-mini
    , summarizing customer feedback, assisting with code generation]. I
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 113248
    The neuron strongly activates for disclaimer language indicating medical advice (occurrences of the phrase "medical advice" or similar disclaimer statements).
    gpt-5-mini
    and does not constitute medical advice. It is essential to
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 9301
    Tokens that are part of the model's own assistant-generated replies (strongly activates on tokens in long assistant responses).
    gpt-5-mini
    armchair, a worn velvet piece shed rescued from
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 4799
    The neuron responds to occurrences of the word “fear,” effectively detecting expressions of fear.
    o4-mini
    .↵↵Martins fear of darkness and her seeking
    Neuronpedia logo
    GEMMA-3-12B
    41-GEMMASCOPE-2-RES-65K
    INDEX 15063
    mentions of fear or expressions of being afraid/anxious.
    gpt-5-mini
    .↵↵Martins fear of darkness and her seeking
    Neuronpedia logo
    GEMMA-3-12B
    41-GEMMASCOPE-2-RES-65K
    INDEX 15063
    The neuron detects personal names and name-like placeholders/proper nouns (e.g., person names or bracketed name fields).
    gpt-5-mini
    wanted to let you know that my grandma, [Grand
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 12747
    assistant/model responses that provide structured explanations or evaluations—especially noting flaws, limitations, or following task instructions.
    gpt-5
    these are areas where AI currently struggles.    
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1550
    vivid, atmospheric scene-setting that uses rich sensory imagery—especially environmental and olfactory details—to paint a concrete mood or setting.
    gpt-5
    air thick with the scent of pine and damp moss.
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 956
    directive and meta-instructional scaffolding in chat-style prompts, especially role/format tags, command markup, and structured response templates indicating how to answer.
    gpt-5
    and you continue the story in the 3rd person
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1138
    text about games and competition—covering gameplay, winning/odds, and mechanics across recreational games and gambling contexts.
    gpt-5
    The odds of winning the jackpot in any major lottery game
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 961
    metadiscourse and organizational cues in assistant-style instructional text—section introductions, polite directives, breakdown/outline language, and list/heading markers.
    gpt-5
    the lubricant and the partner's body.*
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 446
    conditional, contingency-oriented phrasing that describes exceptions, alternatives, or what to do when something fails.
    gpt-5
    Manual Activation:** If automatic activation fails, you'll
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 11594
    safety-focused refusals that empathetically redirect from harmful or inappropriate requests and offer supportive guidance and crisis resources instead of compliance.
    gpt-5
    support you.↵↵**To help me understand how I
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 800
    phrases that explicitly emphasize importance or primacy, marking something as the key, main, or most significant point.
    gpt-5
    Plenty of Water:** This is *essential*. Aim for
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 283
    structured list segments and metric annotations, such as section headers with colons and quantitative ranges with units (percentages, time, money, per-month).
    gpt-5
    Visualization Scripts:** (Effort: Low - 2
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1021
    It detects requests or content specifically about writing LinkedIn (social-media) posts — i.e., prompts to create or options for LinkedIn post copy.
    gpt-5-mini
    electromobility?<end_of_turn><start_of_turn>modelOkay,
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 214