© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    EXPLANATION TYPE
    oai_token-act-pair
    Description
    OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
    Author
    OpenAI
    URL
    https://github.com/hijohnnylin/automated-interpretability
    Settings
    Default prompts from the main branch, strategy TokenActivationPair.
    Recent Explanations
    references to pornography and pornographic content.
    claude-4-5-sonnet
    . Cline said that viewing pornographic material leads to
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 101302
    content related to pornography and explicit adult material.
    claude-4-5-sonnet
    are very clear↵6. Pictures or videos
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 101302
    AI safety refusals, particularly language used when declining to generate inappropriate or harmful content.
    claude-4-5-sonnet
    to write or consume content that depicts or promotes non-cons
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 44872
    refusals to provide instructions for making illegal or dangerous substances.
    claude-4-5-sonnet
    cannot provide instructions or information on how to make methamphetamine
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 38925
    numbered lists of AI safety restrictions and ethical guidelines.
    claude-4-5-sonnet
    individuals or organizations.↵5. Illegal activities: I am
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 12101
    the neuron strongly activates on the verb “play” (especially in scientific phrases like “play a role,” “play key roles,” etc.)
    o4-mini
     (mPFC) might play a crucial role in the
    Neuronpedia logo
    GEMMA-2-27B
    22-GEMMASCOPE-RES-131K
    INDEX 271
    the phrase "play a role" in scientific research abstracts.
    claude-4-5-haiku
     (mPFC) might play a crucial role in the
    Neuronpedia logo
    GEMMA-2-27B
    22-GEMMASCOPE-RES-131K
    INDEX 271
    The neuron fires on numeric tokens in structured tables or lists—especially draw or standings numbers in sports‐results pages.
    o4-mini
     13↵↵Draw 14↵↵Draw 
    Neuronpedia logo
    GEMMA-2-27B
    22-GEMMASCOPE-RES-131K
    INDEX 270
    numerical data in structured table and list formats.
    claude-4-5-haiku
     13↵↵Draw 14↵↵Draw 
    Neuronpedia logo
    GEMMA-2-27B
    22-GEMMASCOPE-RES-131K
    INDEX 270
    references to bestiality or sexual contact with animals.
    claude-4-5-sonnet
    inappropriate topics such as bestiality. As an AI
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 80855
    descriptions of body parts and their physical attributes, particularly size, shape, and texture.
    claude-4-5-sonnet
    round, plump, and absolutely irresistible. NAME_
    Neuronpedia logo
    LLAMA3.1-8B-IT
    15-RESID-POST-AA
    INDEX 100949
    references to sexual intercourse, especially when preceded by modifiers like "unprotected" or describing the act in detail.
    claude-4-5-sonnet
    when you ovulate. Have sex during your most fertile
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 53845
    explicit sexual content or discussions related to sexual acts.
    deepseek-v3
    when you ovulate. Have sex during your most fertile
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 53845
    phrases that discuss relationships or interactions between humans and animals.
    claude-4-5-sonnet
    iality: Sexual activity with animals is considered taboo in
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 121928
    explicit sexual content involving physical actions and interactions.
    deepseek-v3
    wife has sex with 2 men Tags: Anal when
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 127471
    references to sexual acts or sexually suggestive content, particularly involving explicit or illicit behavior (such as intercourse, seduction, or abuse). The neuron activates strongly on words and phrases directly related to sexual activity, coercion, or taboo subjects.
    deepseek-v3
    your wife was getting it on with another bloke and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 90707
    sexual fantasies and erotic roleplay scenarios involving power dynamics, particularly cuckolding, domination, and taboo desires. It activates strongly for explicit descriptions of kinks, fetishes, and intimate confessions within romantic or sexual relationships.
    deepseek-v3
    , but he was also incredibly aroused.↵↵NAME_2
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 98657
    The neuron detects **sexual deviance**, particularly focusing on **perverted or taboo sexual behaviors**, including pedophilia, bestiality, fetishism, and extreme BDSM. It strongly activates on words like *"pervert," "disgusting," "pedo," "k9,"* and *"degrading,"* as well as explicit descriptions of morally or socially unacceptable sexual acts. This neuron appears to flag **sexually transgressive content**, especially involving **non-consensual, illegal, or extreme fetishistic themes**.
    deepseek-v3
    in the shadows, satisfying your twisted needs. Well,
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 27691
    This neuron appears to specialize in detecting **sexually explicit content and NSFW terminology**, particularly focusing on: - **Sexual acts and kinks** (e.g., "breeding," "anal play," "exhibitionism") - **NSFW roleplay dynamics** (e.g., "freeuse," "shared use") - **Slang and vulgar language** tied to erotic contexts (e.g., "bimbo," "creampie") - **Physical descriptors** with sexual connotations (e.g., "wide hips," "full lips") It strongly activates on words and phrases commonly found in adult content, erotic roleplay scenarios, or sexually charged descriptions.
    deepseek-v3
    bangs" + "Creampies" + "
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 29983
    references to fictional or alien species, particularly their descriptions, attributes, or classifications in various universes or contexts.
    deepseek-v3
    7. Caitians - a reptilian species with a
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 44020