OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
descriptions of competency-based education and mastery-oriented progress in learning (advancement based on demonstrated competencies rather than time).
gpt-5
‑time to the demonstration of mastery of explicit, observable
The neuron fires strongly on numeric tokens—digits, numbers (including years), and related symbols (like “=”)—i.e. it’s looking for numerals/math expressions.
o4-mini
Invention of the iPhone Than to the Building of the Great
The neuron detects mentions of clinical symptoms and symptom-describing phrases in medical text, especially respiratory symptoms and related clinical descriptors.
gpt-5-mini
uritic chest pain can occur in patients and presents as
The neuron detects informative, content-heavy words (longer nouns/verbs and section-heading tokens) — i.e., key informational terms in instructional or explanatory text.
gpt-5-mini
and how this company differentiates itself.↵* **The
The neuron strongly activates on the names of software products, platforms, frameworks, or technical components (e.g. “WhatsApp,” “Android,” “.NET,” “MVVM/WPF,” etc.).
o4-mini
turn on dark mode on WhatsApp for Android, follow these
statements and headings that frame structured analysis or troubleshooting, signaling problem identification, core issues, challenges, breakdowns, and considerations.
This neuron spots words and phrases that introduce or label problems—like “issue,” “breakdown,” “core problem,” or other signals that a difficulty is being explained.
critical evaluations of media that call out contrivance or unrealistic, overly neat/predictable elements, often marked by intensifiers and evaluative qualifiers.