EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
mathematical expressions and calculations involving variables and numerical operations.
claude-4-5-haiku
13 = 3. And 11 boys would give
DEEPSEEK-R1-DISTILL-LLAMA-8B
17-LLAMASCOPE-SLIMPJ-OPENR1-RES-32K
INDEX 23818
mathematical expressions and equations containing variables and numerical operations.
claude-4-5-haiku
). Again, the neighbors of 3 can be
DEEPSEEK-R1-DISTILL-LLAMA-8B
21-LLAMASCOPE-SLIMPJ-OPENR1-RES-32K
INDEX 29518
domain-specific technical or formal nouns that name systems, processes, fields, measurements, or key entities in a topic (e.g., networks, data, instruments)
gpt-5
This is a type of intensity which is not the property
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 35399
phrases indicating someone assuming control or a role/position, especially “take over as” style leadership or responsibility transitions.
gpt-5
On:↵↵Ryan will take over full-time GM duties↵↵
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 9301
mentions of a specific proper-noun brand/company name (a named brand reference).
gpt-5
, so don’t let Giant’s marketing people put you
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 91616
mentions of entertainment awards and honors, including nominations, wins, categories, and award-show contexts.
gpt-5
including the Genie Awards, The Gemini Awards, The
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 54690
mentions of the legal term “certiorari” in U.S. court case citations or headings.
gpt-5
11th Cir. Certiorari denied.↵↵
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 118424
snippets of source code (programming tokens and identifiers).
gpt-5-mini
keep_fnames: false,↵ mangle:
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 6662
tokens that are parts of systematic chemical compound names (IUPAC-style fragments, numbers and hyphenated segments).
gpt-5-mini
ographic Chemicals↵3-Chloro-6-
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 13279
phrases where the speaker asks for help identifying something (first‑person requests/questions like "can anyone help identify this" or "what is this").
gpt-5-mini
my experience (She says it is a cactus but
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 109563
the neuron detects interrogative/question cues — tokens that start or appear in questions (question words and auxiliaries used to form questions).
gpt-5-mini
Which time period would you choose and why?"<|eot_id|><|start_header_id|>
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 8655
tokens that are part of markup/structural document tags (HTML/XML‑style tags and other structural delimiters).
gpt-5-mini
media="print" />↵ <script type="text
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 52056
This neuron detects first-person self-referential pronouns and tokens (e.g., "I", "me", "my", and equivalents in other languages).
gpt-5-mini
gosto de muitas coisas, desde a po
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 123529
questions asking about the assistant's personal attributes or identity (age, location, appearance, name).
gpt-5-mini
can you describe what you look like?<|eot_id|><|start_header_id|>assistant
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 77487
text asking for advice, opinions, or guidance (i.e., requests for help or recommendations).
gpt-5-mini
. Anthony wants to know should he cut his losses and
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 109985
It detects the start of an assistant reply / conversation turn boundary (tokens marking or immediately after the assistant's response start).
gpt-5-mini
assistant<|end_header_id|>↵↵Here are some tips to stay awake:↵↵
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 79780
questions asking about someone's favorite or preferred thing (e.g., "favorite", "好きな", with items like color/food).
gpt-5-mini
each person's favorite color is in the table below:↵
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 105075
the presence of numeric quantities (numbers and measurements such as distances, years, counts, temperatures).
gpt-5-mini
241 miles (388 km)↵3. The Kali
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 91828
tokens used for document structure, metadata, and speaker/date labels (speaker names, IDs, and numeric/date tokens).
gpt-5-mini
the bug was there forv 7 years<|eot_id|><|start_header_id|>
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 126477
This neuron detects structural/control tokens marking conversation boundaries (end-of-turn/end-of-text and header/start markers).
gpt-5-mini
how do i meditate<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵M
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 25500