EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
the hyphen in "heavily-" when it appears as part of compound modifiers or adjectives.
claude-4-5-sonnet
↵↵Heavily-armed pirates from Somalia have
GPT2-SMALL
8-RES_FS12288-JB
INDEX 1080
numeric literals, especially integers and floating-point numbers in code.
claude-4-5-sonnet
mem.disp != 0: disp = abs(
GPT2-SMALL
8-RES_FS12288-JB
INDEX 1000
nouns referring to specific entities or categories within a particular domain or context.
claude-4-5-sonnet
centred around one specific team: the Winnipeg Jets.
GPT2-SMALL
8-RES_FS12288-JB
INDEX 10000
comparative phrases expressing increase or correlation, particularly "the more... the more" constructions.
claude-4-5-sonnet
need your support! The more funds we raise the better
GPT2-SMALL
8-RES_FS12288-JB
INDEX 200
comparative phrases using "the more" followed by a consequence or result.
claude-4-5-haiku
need your support! The more funds we raise the better
GPT2-SMALL
8-RES_FS12288-JB
INDEX 200
the syllable "Hy" at the beginning of words, particularly in proper nouns and names.
claude-4-5-sonnet
Mario Kart 8↵↵Hyrule Warriors↵↵Captain
GPT2-SMALL
8-RES_FS12288-JB
INDEX 10
proper nouns, particularly names of people and places.
claude-4-5-haiku
Mario Kart 8↵↵Hyrule Warriors↵↵Captain
GPT2-SMALL
8-RES_FS12288-JB
INDEX 10
sentences that state technical explanations or factual/descriptive information (i.e., salient content words in expository sentences).
gpt-5-mini
divided into smaller chunks.↵2. Model Parallelism
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 105123
It detects descriptions of a person’s clothing and physical appearance.
gpt-5-mini
↵belly button; she also wore a pair of tight
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 70374
The neuron detects numeric expressions—numbers, measurements, and decimal/percentage-like tokens.
gpt-5-mini
.0 (0.6--42.7
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 55267
the neuron responds to content-bearing or topical words (important nouns, verbs, pronouns and discourse markers) rather than function or filler tokens.
gpt-5-mini
in response to the ever-changing demands of the modern
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 76062
sentences or phrases expressing future claims, promises, or predictions (marked by modal/future constructions like "would," "going to," "will").
gpt-5-mini
17.↵↵“New roads and high roads”:
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 21511
legal discussion of standards of review—phrases contrasting questions of law and fact (de novo, standard of review, jurisdiction, etc.).
gpt-5-mini
fact issue whatever is involved in reaching that determination. In
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 38277
signals that a section heading or paragraph-level label (e.g., a titled or colon-ended section start) is beginning.
gpt-5-mini
study. Results and conclusions: The patients had upper respiratory
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 63269
It detects numeric tokens and digit-heavy sequences (numbers, figure/section/table indices and other multi-digit numeric strings).
gpt-5-mini
IFN-γ, IL-10, and TNF
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 95160
tokens carrying strong semantic content or topical importance (salient content words).
gpt-5-mini
unsure what is and isn't true and who,
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 1882
mentions of "viruses" (references to viruses).
gpt-5-mini
32}↵================================↵↵Attempts to add the
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 20651
the neuron highlights salient, information-dense tokens—important content words (main verbs, nouns, numbers) and emphatic punctuation that carry the core facts or claims.
gpt-5-mini
U.S. prisoners have been released from North Korea
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 47513
This neuron detects the definite article "the" (the token " the", especially in phrases like "What is the ...").
gpt-5-mini
user↵<bos>What is the t'th term of
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 74100
tokens that introduce or express an evaluative/opinionative stance (judgments, endorsements, or assessments).
gpt-5-mini
Luke Mula↵↵Okay<end_of_turn>↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 106969