EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
This neuron never activates—it doesn’t detect any meaningful pattern (a “dead” neuron).
o4-mini
to run external commands and capture their input/output streams
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 13935
This neuron detects mentions of dialog slot names together with their values (e.g. “area = east”, “people = 3”).
o4-mini
bool Repaint);' -Name 'Win32
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16142
The neuron activates strongly on words that describe taking away or withholding possessions (e.g. “confiscation,” “footwear,” “defending,” “preoccupied”).
o4-mini
And you’re clearly preoccupied with self-inflicted
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12732
function words that serve as grammatical glue—especially prepositions and auxiliary verb contractions linking actions or clauses
gpt-5
And you’re clearly preoccupied with self-inflicted
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12732
imperative action commands in code or technical instructions, especially those manipulating windows or performing drawing/movement operations.
gpt-5
bool Repaint);' -Name 'Win32
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 16142
uncommon subword fragments, especially question-word stems and isolated letter-like tokens.
gpt-5
<bos><start_of_turn>user↵Coq10/l-
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 239477
the model's self-identification — it activates on mentions of the assistant's name / self-introduction.
gpt-5-mini
↵↵My name is Gemma! I was trained by the
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 99258
named entities, especially distinctive proper nouns like company, brand, platform, or person names.
gpt-5
sites here relevant to Enron,s businesses.↵
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 4240
titles/headings and salient domain-specific terms or proper nouns that signal the main topic of a passage.
gpt-5
↵## Making a Classic Cheesecake: A Comprehensive Guide↵↵
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 5701
tokens used in headings or emphasized/important document structure (bold markers, section numbers, dates, and other emphasis/heading tokens).
gpt-5-mini
become standard.↵* **Better Training Data:**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6989
chat turn-taking structure and the assistant’s opening response markers (role tokens and initial affirmations).
gpt-5
zombies<end_of_turn>↵<start_of_turn>model↵Okay, you want *
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 7491
markdown-style section headers and subheadings in outlines, especially bolded headings that end with a colon.
gpt-5
<end_of_turn>↵<start_of_turn>model↵Okay, the fall of
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 1069
words describing prohibited content types and policy violations on online platforms.
claude-4-5-haiku
threats, hate speech, advocating violence and other violations can
GEMMA-2-27B
22-GEMMASCOPE-RES-131K
INDEX 11854
the character sequence "thro" inside tokens (a common subword in medical/biological terms).
gpt-5-mini
Deceased, and Iola Saunders, Administratrix cum
GEMMA-2-2B
1-CLT-HP
INDEX 1
Mentions of running external processes or using subprocess/shell commands to execute and capture program input/output.
gpt-5-mini
to run external commands and capture their input/output streams
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 13935
the neuron responds to technical or scientific content—terms, measurements, and data-heavy/highly specific words found in experimental or domain-specific descriptions.
gpt-5-mini
under its native promoter. RNAseq data were generated from
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 20363
text written in a robotic/AI persona with formal, protocol-driven technical phrasing, structured acknowledgments, and system-style markers (often including numeric designations).
gpt-5
4. Mimicry protocol initiated. Acknowledged
GEMMA-3-27B-IT
40-GEMMASCOPE-2-RES-262K
INDEX 12303
tokens that occur at the start of a sentence or turn (beginning-of-sentence/turn tokens).
gpt-5-mini
<bos><start_of_turn>user↵Coq10/l-
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 239477
The neuron fires on emphasized or strongly intensifying tokens (words marked or used to add emphasis).
gpt-5-mini
Absolutely essential. Hermitage Museum (world-
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 166260
sentences that are section headings, numbered list items, or other structural/formatting markers (e.g., list numbers and section labels).
gpt-5-mini
(2, 3)↵* **Order:**
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 82843