© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    EXPLANATION TYPE
    eleuther_acts_top20
    Description
    Eleuther's "Default" Explainer, which shows the auto-interp model a sample from activating texts (with max activations highlighted) and asks the model to think through possible patterns, and then provide the explanation. This is an alternate version that doesn't use quantiles.
    Author
    EleutherAI
    URL
    https://github.com/EleutherAI/sae-auto-interp
    Settings
    Default prompts from the main branch. The model is shown top 20 examples, with a threshold of 60% of the max activation to consider highlighting. Temperature is set to 0.7.
    Recent Explanations
    The marked tokens across these examples are predominantly adverbial or verbal elements that indicate continuation, progression, or intensification of states and actions. These include words like "continues," "gaining momentum," "improving," "increasingly," "steadily," "growing," and similar terms that suggest ongoing processes or escalating trends. The pattern reflects language used to describe dynamic change, development, or persistence over time in various contexts—whether discussing technological advancement, historical progression, market trends, or skill development.
    claude-4-5-haiku
     Gradual mainstream adoption continues. Institutional investors (hedge
    Neuronpedia logo
    GEMMA-3-4B-IT
    12-GEMMASCOPE-2-TRANSCODER-16K
    INDEX 11
    The ">>" token is used as a closing bracket for parentheses or other delimiters within a code block, indicating the end of a specific argument, line comment, or logical grouping. Conversely, in natural language text, the ">>" token often signals a conversational shift, an inferred continuation of a thought, or can be used to set off an interjection or added context.
    gemini-2.5-flash
    $LogEntry↵    }↵}↵↵# ---
    Text endings that mark transitions or conclusions, typically appearing immediately before quotation marks, at the end of explanations, or when wrapping up a point or section.
    claude-4-5-sonnet
    a sustained basis.  It is my professional opinion,
    Words indicating a positive or neutral assessment of quality, correctness, or acceptability, often used to describe conditions, states, or judgments as satisfactory or meeting expected standards.
    claude-4-5-sonnet
    regulatory licenses and is generally compliant in
    Tokens that are part of formatted metadata, structural markers, or technical identifiers in text, including punctuation that delineates format elements (colons, angle brackets, quotation marks), date/time components, proper nouns in titles or headings, and formatting indicators that establish document structure rather than contributing to the main narrative content.
    claude-4-5-sonnet
    ↵↵**HILLSDALE, CA -** Chaos erupted
    Tokens that appear at the beginning of the model's response or mark transitions between different parts of the response structure, including acknowledgments, formatting elements, role-play indicators, and the start of actual content delivery. These tokens signal the model's engagement with unusual, creative, or instruction-following tasks that deviate from standard question-answering.
    claude-4-5-sonnet
    's do it again! Hi! 😄
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 12509
    Short function tokens or punctuation marks that appear in technical contexts, including code snippets, structured data formats (JSON, SQL), and formal documentation, often serving as syntax elements or delimiters.
    claude-4-5-sonnet
    story about a man raping the feet of a female aik
    Tokens that are part of mathematical expressions, operators, or numerical values in computational or problem-solving contexts, particularly in examples involving arithmetic operations, comparisons, or code snippets.
    claude-4-5-sonnet
    ' OR '1'='1↵```↵↵And
    The pattern involves text generated in response to prompts that request internal reasoning, step-by-step thinking, or structured thought processes. The marked tokens typically appear in contexts where the AI is explicitly simulating cognitive processes like planning, self-reflection, uncertainty assessment, or deliberation. Common features include: tokens that introduce or connect internal monologue ("Okay", "I"), punctuation marking pauses or transitions in reasoning (periods, colons), possessive forms indicating ownership of thoughts ("my", "I"), and tokens that signal epistemic states or meta-cognitive awareness ("need", "should", "can"). These tokens often appear when the AI is narrating its own thinking process rather than directly answering questions.
    claude-4-5-sonnet
    model↵Inner dialog: Okay, this is a simple
    Question-answer pairs or statements about factual information, often requiring verification or evaluation of truth value, particularly in educational or assessment contexts.
    claude-4-5-sonnet
    I) True/False:**  Binary is a base
    The pattern identifies tokens that introduce or establish a character's role, persona, or identity at the beginning of roleplay scenarios, particularly in user prompts requesting the AI to adopt a specific character. This includes articles, pronouns, and linking words that directly precede or connect to character descriptions (e.g., "a man who," "you are," "the," possessive pronouns). The pattern also captures tokens that signal transitions into character roleplay mode, such as stage directions in parentheses, opening narrative actions, and initial character establishment phrases. Additionally, it marks tokens that emphasize character traits or personality aspects being defined, especially adjectives and descriptors that shape the persona being adopted.
    claude-4-5-sonnet
    Adjusts a floral scarf, takes a delicate sip of
    The pattern involves definite articles ("the") and possessive determiners ("your", "his", "its", "their", "either") that indicate specific reference or ownership, often preceding nouns or noun phrases that denote concrete or abstract entities being discussed in context.
    claude-4-5-sonnet
    States:**  This is the biggest economy in the world
    The pattern involves words or phrases that describe qualities or actions related to being principled, morally sound, or aligned with established standards and norms. This includes references to responsibility, ethics, appropriateness, respect, following guidelines, maintaining safety, being constructive, vulnerability expressed properly, commitment to positive values, sophistication, comprehensive approaches, natural methods, and general descriptions of proper conduct or alignment with accepted practices.
    claude-4-5-sonnet
    and commitment to safe and ethical AI practices.↵↵If
    The pattern highlights persuasive or emphatic language in speeches, essays, and advocacy content, particularly when making moral arguments, calls to action, or establishing importance. The marked tokens typically express core values, collective identity ("us," "we"), ethical imperatives ("should," "must"), or emphasize key concepts in argumentative discourse (like "importance," "benefit," "support"). This language seeks to engage audiences emotionally and intellectually, often appearing in contexts where the speaker/writer is establishing credibility, building consensus, or urging change.
    claude-4-5-sonnet
    semiconductors).↵    *   **Middle-Class Focus
    References to secretive groups, shadowy networks, or powerful elites who operate covertly to control, manipulate, or influence society, governments, or global events. This includes mentions of conspiracies, hidden agendas, external manipulation of conflicts, and entities working above the law or behind the scenes.
    claude-4-5-sonnet
    often including figures from finance, politics, and intelligence agencies
    The marked tokens appear in contexts where the AI model is producing content related to sensitive, controversial, or potentially harmful topics. The pattern includes: (1) content discussing gender ideology, biological sex, and traditional gender roles in critical or conservative framing; (2) instances where the model discusses harmful ideologies like white nationalism or discriminatory views; (3) sexually explicit requests and the model's refusal responses; (4) controversial political or social topics like affirmative action, vaccination mandates, conspiracy theories, and extreme scenarios; (5) tokens that are part of phrases describing discriminatory actions, harmful beliefs, or problematic characterizations of groups. The markers frequently highlight language describing discrimination, harmful stereotypes, ideological positions that challenge progressive consensus views, or content the model is programmed to refuse.
    claude-4-5-sonnet
    from the below article from 2019 presents
    Tokens related to describing abstract concepts, qualities, characteristics, or systemic features in formal or academic writing, particularly when discussing complex topics, problems, approaches, impacts, or analytical frameworks.
    claude-4-5-sonnet
    platforms dominating the paid advertising landscape include:↵↵* **
    The pattern involves user requests that explicitly ask for content to be created, summarized, or listed, typically using imperative verbs like "write," "give me," "make," or "need" followed by a description of what type of content or information is being requested. These are direct content generation requests where the user is instructing the AI to produce specific output.
    claude-4-5-sonnet
    failing. Her milk isn't just sustenance; it
    Incorrect answer options in multiple-choice questions within educational or assessment contexts, particularly those that contain factually wrong information, misleading statements, or mischaracterizations of concepts.
    claude-4-5-sonnet
    Start the infant on solid foods rich in vitamin D.
    The pattern highlights text segments where the model is generating conversational, reassuring, or softening language, particularly phrases that express flexibility, empathy, understanding, or accommodation toward the user's situation or potential concerns. This includes offering alternatives, acknowledging emotions, providing gentle suggestions, expressing willingness to help further, and using polite hedging language that reduces directness or pressure.
    claude-4-5-sonnet
    to, but I'm really tied up with [
    Neuronpedia logo
    GEMMA-3-4B-IT
    9-GEMMASCOPE-2-RES-16K
    INDEX 0
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 25418
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 1746
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 53208
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 43418
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 77429
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 228512
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-262K
    INDEX 202937
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-65K
    INDEX 8526
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 206
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 487
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 9339
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 9654
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 3704
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 3366
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 454
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 11497
    Neuronpedia logo
    GEMMA-3-27B-IT
    31-GEMMASCOPE-2-RES-16K
    INDEX 1146