OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
formal, institutional language—especially abstract nouns and titles tied to legal or religious authority, often accompanied by concessive transitions like nonetheless or nevertheless
gpt-5
of the option; it nonetheless retains its essential characteristic as
phrases that describe something as occurring in or derived from nature, typically using an adjective before a noun in scientific or technical contexts.
verbs indicating concrete actions taken by someone (often the author) to do, create, or try something, especially in technical/problem‑solving contexts.