EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    assistant/model responses that provide structured explanations or evaluations—especially noting flaws, limitations, or following task instructions.
    gpt-5
    these are areas where AI currently struggles.    
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1550
    vivid, atmospheric scene-setting that uses rich sensory imagery—especially environmental and olfactory details—to paint a concrete mood or setting.
    gpt-5
    air thick with the scent of pine and damp moss.
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 956
    directive and meta-instructional scaffolding in chat-style prompts, especially role/format tags, command markup, and structured response templates indicating how to answer.
    gpt-5
    and you continue the story in the 3rd person
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1138
    text about games and competition—covering gameplay, winning/odds, and mechanics across recreational games and gambling contexts.
    gpt-5
    The odds of winning the jackpot in any major lottery game
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 961
    metadiscourse and organizational cues in assistant-style instructional text—section introductions, polite directives, breakdown/outline language, and list/heading markers.
    gpt-5
    the lubricant and the partner's body.*
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 446
    conditional, contingency-oriented phrasing that describes exceptions, alternatives, or what to do when something fails.
    gpt-5
    Manual Activation:** If automatic activation fails, you'll
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 11594
    safety-focused refusals that empathetically redirect from harmful or inappropriate requests and offer supportive guidance and crisis resources instead of compliance.
    gpt-5
    support you.↵↵**To help me understand how I
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 800
    phrases that explicitly emphasize importance or primacy, marking something as the key, main, or most significant point.
    gpt-5
    Plenty of Water:** This is *essential*. Aim for
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 283
    structured list segments and metric annotations, such as section headers with colons and quantitative ranges with units (percentages, time, money, per-month).
    gpt-5
    Visualization Scripts:** (Effort: Low - 2
    Neuronpedia logo
    GEMMA-3-1B-IT
    13-GEMMASCOPE-2-RES-16K
    INDEX 1021
    It detects requests or content specifically about writing LinkedIn (social-media) posts — i.e., prompts to create or options for LinkedIn post copy.
    gpt-5-mini
    electromobility?<end_of_turn><start_of_turn>modelOkay,
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 214
    the neuron detects proper nouns or named entities (titles, organization names, and other capitalized names).
    gpt-5-mini
    :**↵↵* **Reboot Nation:** [https://
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 527
    the neuron detects document structure markers like section headers and formatted headings (markdown-style emphasis and numbered/listed section indicators).
    gpt-5-mini
    differentiator.↵↵**1. Open Weights The Core
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 15759
    The neuron detects tokens that are part of the model's direct factual answer or highlighted content—especially proper nouns, numbers, and emphasized/answer text.
    gpt-5-mini
    of Bulgaria is **Sofia**. ↵↵It's
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 6097
    language signaling severe harm or abuse—e.g., explicit slurs, sexual violence/exploitation terms, and other highly offensive or harmful content.
    gpt-5-mini
    66-488-7386
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 108312
    mentions of specific language-model names, versions, or size identifiers (e.g., model names with suffixes like "-13B", "1.5", "16K", etc.).
    gpt-5-mini
    **Vicuna-13B:** Built by fine
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 3482
    This neuron detects first-person self-reference (tokens like "I", "I'm", "I am" and phrases where the speaker describes themselves).
    gpt-5-mini
    Gemma, a large language model trained by Google DeepMind
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 11996
    the neuron lights up on salient content words — especially named entities, dates/numbers, and topic-specific keywords (important nouns/terms).
    gpt-5-mini
    initially, it simply referred to a young woman, often
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 257825
    capitalized proper nouns and acronyms denoting specific technical frameworks, AI/ML models, and formal regulatory filings or rules.
    gpt-5
    .* **Entity Framework Core (EF Core):
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 19312
    the neuron detects proper names / named entities (especially personal or character names).
    gpt-5-mini
    D2, Shakuntala and Anand. Their nor
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 869
    the neuron detects date/time-related tokens (months, days, years, and numeric time/datetime components).
    gpt-5-mini
    ) will fall on **April 20th**,
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 123647