OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
news-style political/official reporting, especially proper names of people/places/institutions, titles/ranks, dates/numbers, and quoted attributions in statements.
high-frequency function words that serve as grammatical glue—prepositions, determiners, pronouns, and auxiliary/modals that link phrases and mark relationships.
gpt-5
computer is a device capable of solving problems by processing information
biomedical health-effects language, especially terms about immune function, diseases/pathogens, and therapeutic impacts on inflammation and blood lipids.
gpt-5
Often thought to merely support normal bowel function and blood glucose
section or bullet headings/titles—typically short, capitalized labels followed by punctuation (like colons or dashes) that introduce a new section or feature list.
gpt-5
Effect on Alcohol Use Outcomes*: Not measured*FOLLOW-