OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
The neuron detects document metadata and boilerplate — headings, bylines/author lines, dates, copyright/contact info and other structural header elements.
finds words and tokens expressing possibility or impossibility and the idea that something is difficult (e.g., "possible", "impossible", "task", "monstrous").
gpt-5-mini
realise this may well be considered an impossible task that's
with absolutely nothing but harm, charges, and no help
Looking at the activations, this neuron activates strongly on:
- Possessive constructions with apostrophes (Village Clerk**'s**, driver**'s**, owner**'s**)
- References to specific locations/jurisdictions (Village of Hempstead
claude-4-5-sonnet
higher conference.↵↵It’s amazing how good of
non-English text, particularly Greek and other European languages.
claude-4-5-sonnet
Σε κανένα στάδιο της διαδικασίας η εται
narrative turning points and transitions where key events or actions occur in a story.
claude-4-5-haiku
Managing investment portfolios for individuals or institutions. (Requires
The neuron is essentially inactive and does not respond to any token—it finds nothing.
o4-mini
in its pathogenesis, namely, serotonin, glutamate, nore
The neuron lights up on general-purpose discourse or stance markers—common evaluative or filler words and phrases like “OK,” “fine,” “absolutely,” “that,” “it,” and “was.”
The neuron fires on technical terms referring to optical or color‐spectral properties (e.g. spectral characteristics, color filters, wavelengths, colored light).
o4-mini
are generally provided with spectral filters for the three colors,