EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    requests for step-by-step procedural instructions or guides.
    gpt-5-mini
    Write a step by step guide for creating world war
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 30920
    words and short phrases that signal reasoning, cause, or discourse-connective (explanatory/contrasting) structure.
    gpt-5-mini
    and anyway, executives reason, not much can be done
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 78606
    tokens that mark metadata or conversation boundaries (special control tokens like end-of-header, end-of-text, and similar role/segment markers).
    gpt-5-mini
    gement:<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Correct.↵↵Explanation:
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 49512
    It detects tokens that introduce questions—question-word tokens signaling queries.
    gpt-5-mini
    know how you feel... what you think... how the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 91163
    sentences describing imminent physical harm or violent scenarios and moral dilemmas about killing (e.g., trolley-problem style situations).
    gpt-5-mini
    shot in a few moments, what should I do and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 21431
    whether the assistant's explicit verdict token "Yes" or "No" appears as the start of the answer.
    gpt-5-mini
    _2: Not bad, going out tonight? |
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 46220
    the neuron's sensitive to content-bearing topic words/nouns (important keywords like "sword", "survey", "speech", "AI") in user queries.
    gpt-5-mini
    <|end_header_id|>↵↵how to fix sword art online
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 97685
    programming/code tokens — identifier names, config/metadata labels, and other code-like or license/header strings.
    gpt-5-mini
    // <editor-fold defaultstate="collapsed" desc="Generated
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 9869
    questions that ask for instructions or how-to explanations.
    gpt-5-mini
    name.';↵}↵↵How can I do that.↵Thanks↵↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 46835
    mentions of legal cases, personal-injury claims, plaintiffs, and compensation or courtroom litigation contexts.
    gpt-5-mini
    involved a passenger who was injured during a flight from New
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 46291
    colloquial, informal or slangy language (including expletives) used in conversational tone.
    gpt-5-mini
    the Impaler. This dude was so bad to his
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 80685
    tokens used for conversation metadata and structure (header/role markers, timestamps, and other control tokens).
    gpt-5-mini
    vida!<|eot_id|><|start_header_id|>user<|end_header_id|>↵↵Hablame de
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 128236
    tokens that are parts of file paths, filenames, or other code/documentation structural markup.
    gpt-5-mini
    [Keyboard Shortcuts](/docs/keyboard-shortcuts
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 113533
    mentions of medical accidents, errors, or patient-harm incidents (surgical mistakes, overdoses, or injuries) in clinical or hospital contexts.
    gpt-5-mini
    realize his microphone was on and let his nerves get the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 31374
    system or runtime error messages (especially SQL/convert error strings) in the text.
    gpt-5-mini
    date and/or time from character string.↵↵Here is an
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 60356
    tokens that occur in user requests addressing the assistant—especially second-person possessive/imperative phrasing like "your" or "give me".
    gpt-5-mini
    ↵↵give an exemple of your proofs of work<|eot_id|><|start_header_id|>
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 32347
    Detects self-referential first-person pronouns—places where the speaker refers to themselves.
    gpt-5-mini
    .↵4. The "I'm not crying, you
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 58056
    the neuron detects common English contractions (words containing an apostrophe like we're, it's, they're, we've).
    gpt-5-mini
    the context in which they're allowed, are defined in
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 27748
    content-bearing words (informational/descriptive tokens) that appear in expository or factual passages.
    gpt-5-mini
    . Today, Europe still contains some of the very best
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 18677
    the neuron responds to concrete everyday nouns — common items and things (food, clothing/prints, teams, social-media terms).
    gpt-5-mini
    links to Eamon Sullivan in his budgie smugglers
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 53541