EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
This neuron detects personal names / named entities (proper names of people).
gpt-5-mini
that." |↵| Carol (Neighbor) | Ben
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 8666
the neuron detects prominent technical topic keywords — especially acronyms, product/model names, and domain-specific terms.
gpt-5-mini
<start_of_turn>user↵what is macro in programming? Explain it
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 9350
This neuron detects prominent topic words or subject tokens (main nouns/proper nouns) that indicate the central subject of a query or document.
gpt-5-mini
philosophers and scientists about Artificial intelligence (AI) is rapidly
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 7732
This neuron detects named entities/proper nouns (people, brands, place names, model names and other capitalized terms).
gpt-5-mini
то такий 2Pac?**↵↵*
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 5849
It detects prominent named entities and salient topic tokens (titles, product names, and other key content words).
gpt-5-mini
best" tactic in Football Manager 2023
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 141914
tokens in assistant messages that offer help or ask the user for more information (requests to share code/details or invitations to continue).
gpt-5-mini
you have some code already, please share it!↵
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6171
This neuron detects salient topical content words—domain-specific nouns and named entities that carry the main subject matter of a passage.
gpt-5-mini
"face of the Western propaganda" and a symbol of
GEMMA-3-27B-IT
16-GEMMASCOPE-2-RES-262K
INDEX 31655
tokens representing numbers and dates (numerical values like years, months, times, counts, and other numeric tokens).
gpt-5-mini
News:**↵↵* **Israel-Hamas Conflict
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 40948
the neuron detects strong affirmative or positive-response tokens—i.e., when the model is asserting agreement or labeling content as positive.
gpt-5-mini
?<end_of_turn>↵<start_of_turn>model↵Absolutely, a Baywatch
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 50796
It detects long verbatim or block‑delimited quoted passages (text inside quotes/triple‑quotes or other context blocks).
gpt-5-mini
Committee (IAEC).↵"""<end_of_turn>↵<start_of_turn>model
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 3202
The neuron detects the "model" (assistant) speaker token—i.e., the start of model/assistant responses.
gpt-5-mini
chess<end_of_turn>↵<start_of_turn>model↵```c↵#include
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 111414
the presence of anger-related words or strong angry emotion (tokens expressing anger/frustration).
gpt-5-mini
a mixture of sadness, anger, disappointment and numbness,
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 13847
text-structuring tokens (headings, section titles, list/item markers, and other formatting/organization cues).
gpt-5-mini
and convenience features are must-haves?↵↵↵↵**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 10907
the starts of conditional or hypothetical clauses—tokens that begin “if/imagining” style questions or hypothetical statements.
gpt-5-mini
↵↵**In short: If you're writing for
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 244489
The neuron detects emphatic, declarative claims—strong assertions or superlative statements that stress ability, uniqueness, or certainty.
gpt-5-mini
**↵↵"My superpower? Turning chaos into cuteness
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 200216
The neuron detects salient content-carrying words — important task/topic nouns and verbs (i.e., semantically informative tokens).
gpt-5-mini
passes automated tests is automatically deployed to production.↵*
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 3974
Text that gives explicit instructions or directives to the model—especially prompts to answer questions, assume a persona/alter ego, or perform a specified role.
gpt-5-mini
the night. T answers questions with general statements and does
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 76785
It detects evaluative or summary phrases that state overall assessments or qualifiers (comparisons like "relatively", "are", "all") in explanatory text.
gpt-5-mini
/complexity, but all are relatively accessible):**↵↵
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 249537
the presence of the model/assistant role token that marks the model's generated response.
gpt-5-mini
Dragon?<end_of_turn>↵<start_of_turn>model↵## Helicopters in
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 4568
This neuron detects mentions or references to a child character (especially male child/son) in narrative contexts.
gpt-5-mini
by both people."↵↵Leo scrunched up his
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 14575