OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
The neuron primarily activates on frequently occurring words like "the" and "and" when they appear in technical or instructional contexts, often in close proximity to numbers or specialized terms.
gemini-2.5-flash
pattern may include the sub-steps of: comparing the
The neuron spotlights special control‐ or header‐tokens (like the `<|start_header_id|>`, `<|end_header_id|>`, and similar markers) that delimit and label parts of the chat transcript.
o4-mini
"Do you understand?"<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵No!