EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
mentions of genetic variants and molecular biology assay components within technical scientific contexts.
gpt-5
-antitrypsin gene probe. In eight patients with
GEMMA-2-2B
0-CLT-HP
INDEX 478
mentions of CEOs and executive leadership in corporate news or forward-looking business statements.
gpt-5
, and other forward-looking statements in these remarks,
GEMMA-2-2B
0-CLT-HP
INDEX 437
The neuron detects capitalized named-entity tokens (proper nouns like people, places, teams, organizations).
gpt-5-mini
at St James' Park, after which he said:
GPT2-SMALL
8-RES-JB
INDEX 55
commas that mark clause breaks or parenthetical/afterthought phrases within a sentence.
gpt-5
at St James' Park, after which he said:
GPT2-SMALL
8-RES-JB
INDEX 55
The neuron detects language describing the formation or execution of habits and routine behaviors.
o4-mini
behaviors, these behaviors often become habitual and so routine that
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 99130
This neuron responds to list enumeration markers, in particular digits (and their trailing punctuation) that denote numbered list items.
o4-mini
Technical breakout↵10. Economic recovery↵↵Events before a
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 59638
This neuron responds to dynamic movement verbs—especially those describing traversing through an environment (e.g. “fly through the waters”).
o4-mini
eat with them or fly through the waters like one of
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 59638
the start of an assistant's reply or the beginning of a structured assistant response (tokens marking the assistant role/response).
gpt-5-mini
it?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵There could be several
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 101129
This neuron activates on topical content words—concrete nouns and domain-specific terms (e.g., objects, places, activities, and technical or subject-specific vocabulary).
gpt-5-mini
i dont want to have people in my painting<|eot_id|><|start_header_id|>
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 123331
It detects the start of assistant replies / tokens marking an assistant speaker turn in the conversation.
gpt-5-mini
you<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Hello! I'm Assistant
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 52882
Tokens at the start of an assistant-generated reply (the boundary/marker indicating a model/assistant response).
gpt-5-mini
?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵There are several cem
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 122494
technical or domain-specific terminology (jargon) — i.e., tokens from technical descriptions, code, networking, or scientific/patent language.
gpt-5-mini
appropriate for your internal network, but make sure it is
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 31547
spots meta-commentary about repetition or rehashing of arguments or topics (mentions that something is being repeated or raised again).
gpt-5-mini
rolled his eyes. Not this again. The anti-A
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 122273
This neuron detects conversational greetings and salutations (tokens used to say “hello” or open a conversation).
gpt-5-mini
<|start_header_id|>assistant<|end_header_id|>↵↵Hello! Is there something specific you
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 92688
instances where the text signals prior occurrence or familiarity (that something or someone has been experienced or seen before).
gpt-5-mini
, regardless of whether it has to do with sex,
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 30532
the neuron responds to repeated or boilerplate text—tokens that appear many times in a duplicated/templated phrase or repeatedly reiterated sentence fragments.
gpt-5-mini
close to being fully grown, but he just needed a
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 31284
This neuron detects self-referential mentions of the assistant’s identity (e.g. “AI language model”).
o4-mini
assistant<|end_header_id|>↵↵As an AI language model, my knowledge
LLAMA3.1-8B-IT
7-RESID-POST-AA
INDEX 85651
the word "science" in contexts related to science fiction.
claude-4-5-sonnet
* **Der Wissenschaftler:** Nutzt Intelligen
GEMMA-3-27B-IT
13-GEMMASCOPE-2-TRANSCODER-262K
INDEX 165940
key technical or proper terms that stand out in structured text (often emphasized in lists, tables, or quotes)
gpt-5
Magenta, Yellow, Key (black) | The
GPT-OSS-20B
15-RESID-POST-AA
INDEX 22
technical or domain-specific vocabulary and key terminology appearing in academic and professional documentation.
claude-4-5-haiku
Magenta, Yellow, Key (black) | The
GPT-OSS-20B
15-RESID-POST-AA
INDEX 22