EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
It detects requests or content specifically about writing LinkedIn (social-media) posts — i.e., prompts to create or options for LinkedIn post copy.
gpt-5-mini
electromobility?<end_of_turn>↵<start_of_turn>model↵Okay,
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 214
the neuron detects proper nouns or named entities (titles, organization names, and other capitalized names).
gpt-5-mini
:**↵↵* **Reboot Nation:** [https://
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 527
the neuron detects document structure markers like section headers and formatted headings (markdown-style emphasis and numbered/listed section indicators).
gpt-5-mini
differentiator.↵↵**1. Open Weights – The Core
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 15759
The neuron detects tokens that are part of the model's direct factual answer or highlighted content—especially proper nouns, numbers, and emphasized/answer text.
gpt-5-mini
of Bulgaria is **Sofia**. ↵↵It's
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6097
language signaling severe harm or abuse—e.g., explicit slurs, sexual violence/exploitation terms, and other highly offensive or harmful content.
gpt-5-mini
66-488-7386
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 108312
mentions of specific language-model names, versions, or size identifiers (e.g., model names with suffixes like "-13B", "1.5", "16K", etc.).
gpt-5-mini
**Vicuna-13B:** Built by fine
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 3482
This neuron detects first-person self-reference (tokens like "I", "I'm", "I am" and phrases where the speaker describes themselves).
gpt-5-mini
Gemma, a large language model trained by Google DeepMind
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 11996
the neuron lights up on salient content words — especially named entities, dates/numbers, and topic-specific keywords (important nouns/terms).
gpt-5-mini
initially, it simply referred to a young woman, often
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 257825
capitalized proper nouns and acronyms denoting specific technical frameworks, AI/ML models, and formal regulatory filings or rules.
gpt-5
.↵* **Entity Framework Core (EF Core):
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 19312
the neuron detects proper names / named entities (especially personal or character names).
gpt-5-mini
D2, Shakuntala and Anand. Their nor
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 869
the neuron detects date/time-related tokens (months, days, years, and numeric time/datetime components).
gpt-5-mini
) will fall on **April 20th**,
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 123647
mentions that the model is an open-weights (open-source) model widely available to the public.
gpt-5-mini
open-weights model widely available to the public, explicitly
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 7695
tokens and phrases typical of job application cover letters and hiring-related headers (e.g., "Hiring Committee", "Dear", job titles, subject lines).
gpt-5-mini
Date]↵↵Hiring Committee↵Executive Director, Technical
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 17941
the neuron detects tokens belonging to non-Latin / foreign-language text (e.g., Cyrillic or other non-English script segments).
gpt-5-mini
ються в усіх сферах. ВММ навчаються
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 3834
key topic nouns in instructional or explanatory content.
deepseek-r1
* **Mission Critical Goals:** Emphasizes
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 13825
This neuron detects prominent headings, section titles, or otherwise emphasized/topic-signaling words in the text.
gpt-5-mini
* **Mission Critical Goals:** Emphasizes
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 13825
The neuron detects prominent document headings and titles (section headers, bolded/title-case phrases and other high-salience header text).
gpt-5-mini
<start_of_turn>model↵## Most Important Future Trends in Causal
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 28429
The neuron detects tokens that indicate code or command elements—programming-language, module/primitive, or tooling identifiers and other code-related keywords.
gpt-5-mini
* **`git branch`:** This is the command
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 1687
graphic or explicit descriptions of cannibalism (eating human flesh) or similarly gruesome content.
gpt-5-mini
one to keep them close.↵↵**2. Mor
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 732
technical ML/modeling terms (especially words about model distillation, teacher/student/soft targets, context window/tokens, and transformer-related vocabulary).
gpt-5-mini
These probabilities are "soft targets" because they contain more
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 8782