OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
This neuron lights up on informal, interactive bits of user comments—especially question marks and small reaction/interjection tokens (e.g. “back,” “wow,” “now?”) that signal a conversational or reactive utterance.
A strong detector for sudden, emphatic exclamations or high-intensity emotional interjections (loud reactions, urgencies, and similar bursty dialogue).
phrases and tokens that mark the start of an assistant’s explanatory or organizing reply (e.g., "Okay", "Here", section headers and similar discourse markers).
the neuron activates when the model is producing explanatory/corrective output—reformulations, translations, grammar corrections, and related instructional text.
gpt-5-mini
resident talking to a friend:** "The appearance of the