Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    first-person self-referential statements (the author saying what they did, need, or want).
    gpt-5
    importing from CSV file↵↵I am reading in values from
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 106018
    simplified, analogy-driven explanations in a conversational, second-person style (often “explain like I’m X” content).
    gpt-5
    slide down to the ground. This is similar to the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 380
    the first content word at the start of a new passage or paragraph
    gpt-5
    the information above.↵A 20 year old female patient
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 77844
    parenthetical disambiguators after a person’s name in titles, especially those indicating occupation/sport or location.
    gpt-5
    <|begin_of_text|>Matt Wilson (footballer)↵↵Matthew Wilson,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 3951
    sentence-initial discourse markers that introduce examples, explanations, or contextual framing.
    gpt-5
    at the <160>.↵Here is a dump of the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 13977
    base stems of English contractions (especially negation forms before the apostrophe).
    gpt-5
    looked in them yesterday and didn't find a virgin queen
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 35269
    mentions of authentication and access control, especially device/app locking and unlocking mechanisms, credentials, and security features (passcodes, PINs, biometrics, encryption/keys).
    gpt-5
    Touch ID or your customizable PIN.↵↵Automatic Sync↵↵If
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 53078
    punctuation and connective markers that indicate sentence or clause boundaries.
    gpt-5
    my score is still good. PA and DC are fairly
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 59559
    statements asserting that an action “can be” safely performed (e.g., removed/ignored/closed) without negative consequences.
    gpt-5
    so the explicit lock can be removed.↵↵3. For
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 124756
    assistant-role headers/markers indicating the start of an assistant message in chat-formatted text.
    gpt-5
    are your?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵I am Vic
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 106797
    hardware/electronics-style alphanumeric identifiers and acronyms—such as model numbers, part codes, and versioned interface/spec tokens
    gpt-5
    U0058 & HCMODU005↵↵Thanks
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 99951
    first-person self-referential language, especially collective or possessive references to the author or their group.
    gpt-5
    to load test one of our.net webservices.↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 60540
    end-of-sentence or clause boundary tokens—words with attached punctuation (commas, periods, quotes/parentheses) and special end-of-turn markers.
    gpt-5
    I can input at once<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵It
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 75008
    statements about cybersecurity and privacy risks, especially vulnerabilities that expose sensitive information or credentials and enable attacks.
    gpt-5
    artext auszulesen. Als Basis dafür d
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 77303
    markers that denote the start of an assistant response in the chat transcript (assistant role boundaries).
    gpt-5
    1)<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵To compute the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 10573
    mentions of mines and the Minesweeper game (including demining/mine‑sweeping contexts)
    gpt-5
    an example of a simple Minesweeper game implemented in
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 100798
    mentions of safety hazards involving ingestion, choking, or suffocation, especially warnings about small parts and toy safety for children and pets.
    gpt-5
    parts that might be ingested or removed.↵Avoid toys
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 53689
    text fragments containing non-alphanumeric or non-ASCII symbols (e.g., currency signs, casting/operators, or templating/markup delimiters) within otherwise plain text.
    gpt-5
    _form_sub.receipt_dt::text::date, f
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 113041
    function words, especially the definite article signaling the start of a noun phrase
    gpt-5
    and rabid animals into the story<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 34927
    chat-formatting header markers that indicate the start of an assistant response.
    gpt-5
    ↵↵how to fish<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵To fish
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 85258