OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
statements and headings that frame structured analysis or troubleshooting, signaling problem identification, core issues, challenges, breakdowns, and considerations.
This neuron spots words and phrases that introduce or label problems—like “issue,” “breakdown,” “core problem,” or other signals that a difficulty is being explained.
critical evaluations of media that call out contrivance or unrealistic, overly neat/predictable elements, often marked by intensifiers and evaluative qualifiers.
statements expressing uncertainty or lack of knowledge, such as noting information is unknown, unknowable, unclear, scarce, or not readily ascertainable.
gpt-5
group was really trained is lost or has<end_of_turn>↵
formal disclaimer and limitation language that negates capabilities, promises, rights, or responsibilities (e.g., “will not,” “does not,” “only,” “cannot”)
gpt-5
Shelves app. It will not erase your previous collection.
technical or metadata-like tokens—acronyms, formal category/league labels, file/module paths, dates, and numerals typical of encyclopedic, legal, or programming text