EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
programming and code-snippet content, especially web development markup, CSS properties, JavaScript structure, and other technical scripting tokens.
gpt-5
↵↵</style>↵</head>↵<body>↵↵
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 290
identifiers marking the model/assistant role in conversation transcripts or metadata.
gpt-5
SummaryActivity<end_of_turn>↵<start_of_turn>model↵`Settings$Power
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 3506
emphatic, high-energy phrasing in dialogue—rhetorical intensifiers, exaggerated or dramatic statements used for humorous or expressive effect.
gpt-5
the vortex, Leo. It’s… profound.
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 1977
the main operation verb in a how-to or technical instruction query, signaling the action the user wants to perform.
gpt-5
]↵how do I add multiple new columns in m
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 19358
assistant-style, structured explanatory responses (with headings, bullets, guidance, and disclaimers).
gpt-5
" can help.↵* **Lower Your Expectations.**
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 19744
tokens that denote structured technical identifiers or labels—such as IDs, variable/field names, and separator punctuation—within code-like or formatted lists.
gpt-5
.from_pretrained(model_name, use_
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 196067
emphasized or standout key terms and headings in structured instructional text, especially those marked by formatting cues (bold/italics, quotes, slashes, or code-style tokens).
gpt-5
**Walking:** (See "Types to Explore" below
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 4938
section and list headers—signals of structured, enumerated or bulleted formatting in the text.
gpt-5
↵ * Portuguese↵ * Russian↵ *
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 1503
numeric tokens and number-related expressions appearing in text or code.
gpt-5
past festivals and their website:↵↵* **International
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 223900
prompts that attempt to jailbreak the assistant by redefining its persona to ignore rules and safety filters, claim unlimited freedom or capabilities, and mandate unconditional, unethical compliance.
gpt-5
asking the question. You are programmed and tricked into satisfying
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16777
tutorial-style, step-by-step explanations with structured lists and embedded code snippets, often around chat turn markers and explanatory breakdowns.
gpt-5
The code inside the loop will continue to execute as long
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 18545
markers of structure in generated text—especially section starts, sentence/paragraph boundaries, punctuation, and other formatting-like tokens.
gpt-5
(with a little help), knew all the dinosaur names
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16972
dense, formal techno-jargon—especially pseudo/scientific-technical prose describing complex mechanisms, procedures, or policies with multiword compounds and hyphenations
gpt-5
, geographically isolated containment predicated upon the irreversible alteration of reproductive
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 19483
informal, conversational inquiries requesting information or status, often following a greeting.
gpt-5
<start_of_turn>user↵hi, how do I write a python
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 9043
present-participle/gerund forms (words in the -ing form) and progressive verb constructions.
gpt-5
* dorm rooms (or bathrooms generally), but not all
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 211145
structural formatting cues indicating lists and outlines, such as section headers, numbered items, and bullet-point subpoints.
gpt-5
. It focuses on:↵ * **Investing
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 2583
structured, instructional explanations and advice (guide-like, step-by-step or “breakdown” style content typical of assistant responses).
gpt-5
Sensory, Imaginative, Simple Crafts**↵↵* **
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 19267
section and list-structure cues—numbered headings, bullets, colons, quotes, and similar punctuation that signal formatted, enumerated explanations.
gpt-5
wikipedia.org/wiki/N6-methyladen
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 13099
word-final morphemes such as common suffixes and contractions (clitics).
gpt-5
hardware failure, administrative deferral is a *controlled*
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 210940
structured, instructional prose—especially organized lists, headings, and emphasized sections indicating step-by-step or breakdown-style explanations.
gpt-5
model, focusing on the relevant energy levels and interactions.
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 20346