OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
This neuron is primarily activated by references to women and girls, often in discussions of their social status, rights, demographics, or vulnerabilities.
This neuron identifies explicit mentions of feminist figures, organizations, and works, as well as broader concepts related to feminism and gender studies.
gemini-2.5-flash
for her art installation "The Dinner Party," currently located
mentions of technical systems, tools, or components—especially in computing/cryptography or mechanical contexts—often in constructions with possessive pronouns indicating ownership or association.
enterprise IT and networking jargon, especially acronyms and technical terms for infrastructure, virtualization, identity/access, content management, and storage/file systems.
object-oriented member fields/properties and direct assignments to them in code (e.g., class-level variable declarations and this.field = value lines).