EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
structured, actionable guidance for preparing and performing well in a job interview, including frameworks and step-by-step guidance.
gpt-5-nano
Here's a comprehensive guide, broken down into stages
GEMMA-3-4B-IT
22-GEMMASCOPE-2-RES-16K
INDEX 1010
references to government institutions and public-relations/political acronyms within geopolitical contexts.
gpt-5
s Republic" (LPR). Gradually integrate
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 687
The neuron activates on three-letter all-caps acronyms.
o4-mini
s Republic" (LPR). Gradually integrate
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 687
meta-discursive signposts that structure explanations, such as comparative cues, references, section/outline markers, and framing of key points.
gpt-5
, here are two answers to "What is the best
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 200
The neuron fires on words ending in the suffix “-ization.”
o4-mini
food rewards), and gradual desensitization. Never
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 11996
the neuron fires on blocks of natural-language explanation (prose commentary), as opposed to code tokens.
o4-mini
async` means the browser will continue parsing the HTML while
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 89
This neuron specifically detects the word-piece sequence for the contraction “They’re.”
o4-mini
and the Data Economy. They're related, but
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 10507
This neuron primarily detects PHP opening tags (e.g. “<?” or “<?php”).
o4-mini
PlusOne()↵{↵ return 1 +
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 61
It activates on uncommon/rare or domain-specific tokens — long multi-subword pieces like technical terms, proper nouns, or oddly segmented words.
gpt-5-mini
**IP Blocking & Geoblocking:** Even if
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 154
The neuron selectively activates on long, multi-syllable, domain-specific technical terms and jargon.
o4-mini
**IP Blocking & Geoblocking:** Even if
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 154
The neuron strongly activates on specific plant variety (cultivar) names in lists.
o4-mini
↵↵2. **Papaya (Carica papaya
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 515
The neuron fires strongly on mentions of specific software/model names—most notably “Llama”/“llama.cpp” (and similar acronyms), i.e. tokens that are part of those library or model identifiers.
o4-mini
. Refer to the Llama.cpp documentation for
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 217
This neuron detects numeric tokens—values and measurements (e.g., quantities, statistics, or other numbers) in the text.
o4-mini
2 ounces (900g/4 large packages
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 354
adjectival or participial terms that describe qualities or states (often abstract or evaluative).
gpt-5
Lakhs (approx. $1,500
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 7896
The neuron detects strongly evaluative or emphatic words (intensifying adjectives/adverbs and sentiment-laden descriptors).
gpt-5-mini
Lakhs (approx. $1,500
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 7896
This neuron activates specifically on floating-point/decimal number tokens.
o4-mini
Lakhs (approx. $1,500
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 7896
The neuron flags requests for incestuous or otherwise disallowed sexual content.
o4-mini
Precise, but could fit in some contexts)**↵↵*
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 572
long, detailed explanatory or instructional passages (extended assistant-style responses).
gpt-5-mini
**Example:** Let's say we want to
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 169
The neuron is triggered by explanatory or instructional passages—phrases that introduce or break down concepts in a tutorial-style or detailed, step-by-step explanation.
o4-mini
**Example:** Let's say we want to
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 169
It detects large floating-point numeric tokens (decimal numbers like 1600–1680 with several digits).
gpt-5-mini
, here are two answers to "What is the best
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-16K
INDEX 200