OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
The neuron detects first- and second-person pronouns and related conversational verb forms that indicate personal/addressing language (e.g., "I", "we", "you", "have", "had").
gpt-5-mini
launch.↵Those who I have already got booked on will
The neuron is sensitive to tokens occurring in formal or technical/mathematical contexts—e.g. LaTeX commands, variables, theorem‐ or proof‐style wording, and other formulaic expressions.
The neuron detects document-structure and formatting/markup elements (LaTeX/math constructs, section headings/labels, metadata and other non-prose formatting tokens).
This neuron lights up on informal, interactive bits of user comments—especially question marks and small reaction/interjection tokens (e.g. “back,” “wow,” “now?”) that signal a conversational or reactive utterance.
A strong detector for sudden, emphatic exclamations or high-intensity emotional interjections (loud reactions, urgencies, and similar bursty dialogue).
phrases and tokens that mark the start of an assistant’s explanatory or organizing reply (e.g., "Okay", "Here", section headers and similar discourse markers).