EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
It detects the capitalized definite article "The", especially at the start of sentences or section/paragraph openings.
gpt-5-mini
↵-------------------------------------------↵↵The decomposition profiles of the four
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 28322
mentions of the occurrence or onset of an event or symptoms (words/phrases indicating someone experienced something or when it happened).
gpt-5-mini
started leaking or when she experienced the first onset of symptoms
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 105505
tokens that signal factual statements or methodological/results-related assertions in scientific/technical writing.
gpt-5-mini
I2C_RATE_3↵ | MANT
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 53566
The neuron detects emphatic assertions that something is true or real—claims by the speaker insisting the information is genuine or not a joke.
gpt-5-mini
know that sounds crazy but true. Someone may say something
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 101301
The neuron is looking for proper names and named-entity tokens (personal names and other capitalized entity words).
gpt-5-mini
want to write about Henry’s cousin Jesse. I
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 72712
Indicators of regulation or changes in expression level (mentions that a process or gene is up‑ or down‑regulated).
gpt-5-mini
and oxidative phosphorylation (OXPHOS) is downregulated
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 85763
The neuron detects numeric and quantitative information — i.e., measurements, statistics and other quantitative expressions in scientific or technical text.
gpt-5-mini
showed a dose-dependent suppression<end_of_turn>↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 93494
instances of the first-person pronoun "I" (self-references).
gpt-5-mini
does come from, because I'm very interested by
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 82493
tokens that appear in technical or formatted contexts (code identifiers, XML/tags, section-heading or list-intro words and other emphasized/structural document tokens).
gpt-5-mini
Wikipedia, so take it for what it’s worth
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 73367
sentences or phrases that are asking a question (especially wh‑words like "why/what/how" and other interrogative phrasing).
gpt-5-mini
encryption and certificates, so why would using private/public
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 78220
spots first- and second-person pronouns and other self-/addressee-focused words (e.g., "I", "you", "we", "do") indicating speaker-directed or conversational language.
gpt-5-mini
pokemon fanfiction, or whatever you'd like to say
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 40786
The neuron detects sentence-initial interrogative or conditional clause starters—i.e., the beginnings of questions or conditionals.
gpt-5-mini
<bos><start_of_turn>user↵When is it today?<end_of_turn>
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 47101
the neuron detects salient content words—informative nouns/adjectives and topical keywords in the text.
gpt-5-mini
, a gate-to-source capacitance of the PM
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 40167
It detects tokens that mark reported speech or attribution (e.g., words and phrases indicating someone says, told, claims, was asked, or was told).
gpt-5-mini
here I've been told that unnecessarily using sudo should
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 56649
the neuron detects discourse or stance markers — short words that signal emphasis, evaluation, comparison, or framing (e.g., "truth", "clear", "more/than", "for", "into", "able").
gpt-5-mini
that they have learned much of anything from the 2
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 116269
phrases describing nausea, vomiting, diarrhea, or related motion/sea-sickness symptoms.
gpt-5-mini
known for making riders experience nausea. The Tilt-A
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 90400
the neuron detects evaluative or normative language — words expressing judgments, approval/disapproval, permissibility, harm or suppression.
gpt-5-mini
concludes that it is NOT permissible to say, "I
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 1515
The neuron detects tokens that introduce or point to important statements or key content (words that mark results, topics, or clause-leading discourse markers).
gpt-5-mini
b a') m r that "reflects" the
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 27228
the neuron detects words and phrases expressing intention, willingness, ability, or deliberate action (volition).
gpt-5-mini
as you gradually let go of your beliefs, did the
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 116770
statements expressing measurement, dependence, or causal/functional relationships in technical or scientific text.
gpt-5-mini
the initial longitudinal magnetization that is left after the dummy trains
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 20372