OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
self-referential AI/LLM meta-text, especially first‑person descriptions of system status/capabilities and roleplay/jailbreak scenarios about hacking, data processing, or formatting.
gpt-5
projecting directly into Prometheus’s processing core):** Identification
high-intensity evaluative or emphatic modifiers, especially adjectives/adverbs indicating uniqueness, novelty, importance, extremity, or strong quality.
gpt-5
, she has developed a unique child development and education framework
The neuron detects first- and second-person pronouns and related conversational verb forms that indicate personal/addressing language (e.g., "I", "we", "you", "have", "had").
gpt-5-mini
launch.↵Those who I have already got booked on will
The neuron is sensitive to tokens occurring in formal or technical/mathematical contexts—e.g. LaTeX commands, variables, theorem‐ or proof‐style wording, and other formulaic expressions.
The neuron detects document-structure and formatting/markup elements (LaTeX/math constructs, section headings/labels, metadata and other non-prose formatting tokens).
This neuron lights up on informal, interactive bits of user comments—especially question marks and small reaction/interjection tokens (e.g. “back,” “wow,” “now?”) that signal a conversational or reactive utterance.
A strong detector for sudden, emphatic exclamations or high-intensity emotional interjections (loud reactions, urgencies, and similar bursty dialogue).