EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
short, topic-defining headings or key nouns that state the main subject of the text or user request.
gpt-5
<start_of_turn>user↵CO2 Emissions and Population Density Nexus<end_of_turn>
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 19432
conversation turn delimiters (especially end-of-turn) and short, title-like user prompts.
gpt-5
Presenting a financial data<end_of_turn>↵<start_of_turn>model↵Okay
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6719
code/documentation formatting markers and short identifiers within technical snippets (e.g., bullets, flags, comments, and variable-like tokens).
gpt-5
↵* **`TO sw_dc_da`
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 124831
the start of an assistant’s reply, especially introductory framing that sets up the discussion of the user’s topic.
gpt-5
<end_of_turn>↵<start_of_turn>model↵Okay, let's
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 36615
identifiers and tokens from code snippets, especially snake_case function/variable names with underscores and related code-format elements.
gpt-5
1):↵ if is_prime(number):
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 11232
technical syntax and symbol-heavy tokens in code or formatted text, especially XPath expressions like following-sibling and similar structured snippets.
gpt-5
-sibling::*[1]") # Find the first following
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 85285
explanatory openings that introduce a step-by-step breakdown of a technical item (e.g., code, commands, or messages).
gpt-5
s break down this R code snippet piece by piece.
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 39389
key section headers and bolded, action-oriented list items in structured, instructional responses.
gpt-5
! A shimmering, silver one, if you must know
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 247557
sexually explicit or suggestive content, including nudity, erotic scenarios, and discussions of sexualization or objectification.
gpt-5
are a sexy assistant who has been working for me for
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 35455
descriptions of artificial intelligence and futuristic science-fiction scenarios, especially space colonization, advanced technology, and formal techno-policy or military-style discourse.
gpt-5
strating* it. After the disastrous early attempts at
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6370
mentions of adolescents—especially explicit teen ages or references to teenage status and context.
gpt-5
**The challenges of fitting in and finding your place:**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 9078
references to the fashion modeling world and related industry contexts.
gpt-5
light.↵From photoshoots to playdates,
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 16374
first-person narratives about romantic relationship conflict, especially shifts in affection, betrayal/infidelity, and communication breakdowns.
gpt-5
it will be ok. She agreed. Then we talked
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 44002
formatting and structural cues in prompts and dialogues, such as section labels, list items, numbering, and emphasized elements
gpt-5
in a rock<end_of_turn>↵<start_of_turn>model↵This is a
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 12177
structured how-to or advice-style breakdowns that explain what to do, when, and why, often organized into clear steps with safety guidance.
gpt-5
! Here's a breakdown:↵↵**When the
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 18733
scripted multi-speaker dialogue structure, especially speaker turn labels and direct-address conversational turns typical of roleplay or staged conversations.
gpt-5
's a high bar. We all know the limitations
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 4786
sentence openings that initiate “how”-type questions, with especially strong response to quantitative formulations.
gpt-5
<bos><start_of_turn>user↵How do chess programs work?
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 9214
references to emergency medical response—especially first aid/CPR, rescuing injured or unconscious people, and contacting emergency services.
gpt-5
↵ * **First Aid/CPR:**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 14569
mentions of female people—she/her subjects, women’s roles or names—especially in intimate, relational, or caregiving contexts.
gpt-5
to BDSM. She loves it when I tie
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 2700
structural and discourse cues of the model’s step-by-step math solution (e.g., response headers, newlines/section breaks, and procedural lead-ins indicating the start of an explanation).
gpt-5
?<end_of_turn>↵<start_of_turn>model↵We are given two equations
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 169436