EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
The neuron detects mentions of scams, fraud, impersonation, and related deceptive or criminal activities.
gpt-5-mini
user↵<bos>419148?↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 78934
sentences or phrases where the speaker expresses having an idea, thought, or suggestion (first-person cognitive/introspective statements).
gpt-5-mini
've had an idea. Coats matching the mane
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 14876
Finds words that are main action verbs (verbs indicating actions or agency, especially present-tense/third-person and other salient verb forms).
gpt-5-mini
use the increasing transistor budget to build ever bigger and more
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 80826
The neuron detects sentence endings, strongly activating on sentence-final punctuation (periods) and the ends of sentences.
gpt-5-mini
take care of their vessels. The listing does not confer
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 122357
The neuron detects structural or symbolic tokens—numbers, single-letter/math symbols, brackets and formatting/LaTeX tokens, and other document-structure markers.
gpt-5-mini
convicted on each count as charged by the amended indictment.
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 40180
it responds to long user turns / large blocks of contiguous text (i.e., firing when a turn is lengthy).
gpt-5-mini
subtypes.↵↵ <end_of_turn>↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 8482
it detects code- or math-expression style tokens — control-flow and syntax markers, variable names, and numeric literals.
gpt-5-mini
., any more that it cares about the political circus (“
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 30546
words that signal official statements, policies, principles, goals or other formal/authoritative claims.
gpt-5-mini
the principle of the integrity of Denmark, stipulated that the
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 66822
detects expressions of grief, mourning, condolence, or references to death and loss.
gpt-5-mini
felt a painful sting in her chest, knowing she wouldn
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 53290
The neuron detects structural markers and headings in the text—things like section breaks, metadata tokens, big punctuation headers (=====/-----), and other prominent line-start tokens (e.g., "Q:", "Model", "What", "I'm").
gpt-5-mini
their responsibilities."↵<end_of_turn>↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 68607
the neuron detects racist or strongly derogatory language aimed at social groups (demeaning/offensive statements).
gpt-5-mini
↵<bos> being a thief is very useful, and an
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 60654
References to dates or time markers (months and years, often in "as of" or similar date-stamp contexts).
gpt-5-mini
attacks.↵As of January 2011
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 102435
The neuron detects questions—tokens and contexts that form interrogative sentences (question words and/or a question mark).
gpt-5-mini
that where the passion began?↵↵“Yes, it
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 105834
the neuron detects descriptions of restless or agitated physical movements (people moving about, fidgeting, pacing, or otherwise physically acting out).
gpt-5-mini
having to get up and pace.↵↵Also this article
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 96917
the neuron detects technical, numeric, or math-like tokens (numbers, operators, variables, and other technical/structured code/math tokens).
gpt-5-mini
)/c**(2/7))**(1/4
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 16799
the neuron detects sentence-level punctuation and clause boundaries (commas, periods, quotation marks and other discourse-transition tokens).
gpt-5-mini
the ills downstream from them.↵People must learn to
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 93626
mentions of parental concern/protectiveness or parents trying to control/limit a child's behavior.
gpt-5-mini
stop being so over-protective so he can grow into
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 123395
the neuron detects document structure markers and section headings (e.g., Methods, Data sources, Ethics, Availability) and other formatting/metadata lines.
gpt-5-mini
↵↵Data sources and searches↵----------------<end_of_turn>↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 39004
phrases expressing strong emotion or emphasis (exclamations and emphatic interjections).
gpt-5-mini
m deceased.↵↵Opal: Deceased? Really?↵↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 54315
the neuron activates on content-bearing, informative tokens (important nouns/verbs/adjectives and discourse-focus words) rather than on function words.
gpt-5-mini
is enough. For others, it’s just the
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 82974