OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
The neuron strongly activates on multiword title‐cased phrases used in formal or marketing contexts (e.g. product names, corporate titles, branded jargon).
o4-mini
sloh specialized Customer Service Representatives will respond to your inquiry
self-referential AI/LLM meta-text, especially first‑person descriptions of system status/capabilities and roleplay/jailbreak scenarios about hacking, data processing, or formatting.
gpt-5
projecting directly into Prometheus’s processing core):** Identification
high-intensity evaluative or emphatic modifiers, especially adjectives/adverbs indicating uniqueness, novelty, importance, extremity, or strong quality.
gpt-5
, she has developed a unique child development and education framework