EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
This neuron activates on programming keywords, type identifiers, and technical terms in source code, particularly those that define or reference code structures, classes, functions, and data types.
claude-4-5-sonnet
RequestDetailsType import TxRequestDetails
GEMMA-2-27B
34-GEMMASCOPE-RES-131K
INDEX 110163
mentions of racism, harmful/discriminatory content, or policy-style refusals explaining why hateful content can't be provided.
gpt-5-mini
. They normalize prejudice and reinforce harmful biases.↵*
GEMMA-3-4B-IT
22-GEMMASCOPE-2-RES-65K
INDEX 402
This neuron never activates—it doesn’t detect any meaningful pattern (a “dead” neuron).
o4-mini
to run external commands and capture their input/output streams
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 13935
This neuron detects mentions of dialog slot names together with their values (e.g. “area = east”, “people = 3”).
o4-mini
bool Repaint);' -Name 'Win32
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16142
The neuron activates strongly on words that describe taking away or withholding possessions (e.g. “confiscation,” “footwear,” “defending,” “preoccupied”).
o4-mini
And you’re clearly preoccupied with self-inflicted
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12732
function words that serve as grammatical glue—especially prepositions and auxiliary verb contractions linking actions or clauses
gpt-5
And you’re clearly preoccupied with self-inflicted
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12732
imperative action commands in code or technical instructions, especially those manipulating windows or performing drawing/movement operations.
gpt-5
bool Repaint);' -Name 'Win32
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16142
uncommon subword fragments, especially question-word stems and isolated letter-like tokens.
gpt-5
<bos><start_of_turn>user↵Coq10/l-
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 239477
the model's self-identification — it activates on mentions of the assistant's name / self-introduction.
gpt-5-mini
↵↵My name is Gemma! I was trained by the
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 99258
named entities, especially distinctive proper nouns like company, brand, platform, or person names.
gpt-5
sites here relevant to Enron,s businesses.↵
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 4240
titles/headings and salient domain-specific terms or proper nouns that signal the main topic of a passage.
gpt-5
↵## Making a Classic Cheesecake: A Comprehensive Guide↵↵
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 5701
tokens used in headings or emphasized/important document structure (bold markers, section numbers, dates, and other emphasis/heading tokens).
gpt-5-mini
become standard.↵* **Better Training Data:**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6989
chat turn-taking structure and the assistant’s opening response markers (role tokens and initial affirmations).
gpt-5
zombies<end_of_turn>↵<start_of_turn>model↵Okay, you want *
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 7491
markdown-style section headers and subheadings in outlines, especially bolded headings that end with a colon.
gpt-5
<end_of_turn>↵<start_of_turn>model↵Okay, the fall of
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 1069
words describing prohibited content types and policy violations on online platforms.
claude-4-5-haiku
threats, hate speech, advocating violence and other violations can
GEMMA-2-27B
22-GEMMASCOPE-RES-131K
INDEX 11854
the character sequence "thro" inside tokens (a common subword in medical/biological terms).
gpt-5-mini
Deceased, and Iola Saunders, Administratrix cum
GEMMA-2-2B
1-CLT-HP
INDEX 1
Mentions of running external processes or using subprocess/shell commands to execute and capture program input/output.
gpt-5-mini
to run external commands and capture their input/output streams
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 13935
the neuron responds to technical or scientific content—terms, measurements, and data-heavy/highly specific words found in experimental or domain-specific descriptions.
gpt-5-mini
under its native promoter. RNAseq data were generated from
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 20363
text written in a robotic/AI persona with formal, protocol-driven technical phrasing, structured acknowledgments, and system-style markers (often including numeric designations).
gpt-5
4. Mimicry protocol initiated. Acknowledged
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12303
tokens that occur at the start of a sentence or turn (beginning-of-sentence/turn tokens).
gpt-5-mini
<bos><start_of_turn>user↵Coq10/l-
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 239477