EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
user query topics or keywords.
deepseek-r1
1. Violet Evergarden (Violet Evergarden)**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 1355
mentions of personal names or name placeholders (e.g., "Name", "Your Name", salutations/signatures) in emails and templates.
gpt-5-mini
My name is [Your Name] from [Your Company
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 5100
tokens that indicate the speaker’s identity as a large language model (words like "large", "language", "model" and related self‑identifying phrases/questions).
gpt-5-mini
I am a large language model, trained by Google.
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 25361
the presence of person names (named-entity tokens identifying people).
gpt-5-mini
101 by Thomas Frank Explains:** [
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 5291
The neuron strongly detects named entities—proper nouns like people, organizations, places, and other capitalized names.
gpt-5-mini
As an illustration, CC Ltd, engaged to develop a
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 32729
It detects document titles/headings and prominent proper-noun phrases (section or project names).
gpt-5-mini
↵↵↵↵**Project ReWeave: The Core Components**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 640
This neuron detects personal names—capitalized named entities referring to people.
gpt-5-mini
↵↵* **Katrina Law (Quinn Liu
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 815
the neuron detects proper names (personal and other capitalized named entities) in text.
gpt-5-mini
* **Dan Geer:** A very prominent figure
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 7779
The neuron detects technical or domain-specific terms and acronyms/proper nouns in the assistant's responses.
gpt-5-mini
Imaging and Communications in Medicine) is the international standard for
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 650
Tokens that mark dates, numbers, or named news/events (e.g., event titles, competitions, conferences, and other time- or event-related proper nouns).
gpt-5-mini
149th Kentucky Derby.↵*
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 3445
tokens related to CSS layout properties and values (especially CSS Grid properties like grid, grid-template-columns, gap, margin, width).
gpt-5-mini
: grid;↵ grid-template-columns:
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 8258
the neuron detects brand/product/software names and other proper nouns (named entities) often appearing in recommendation or commerce contexts.
gpt-5-mini
* **GraphicsGale (Windows Only, Free
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 5701
mentions of AI/LLM model identity and ecosystem details—covering capabilities, training/data specs, APIs, affiliations, and release timelines—especially in the context of Google’s offerings.
gpt-5
become standard.↵* **Better Training Data:**
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 6989
the phrase "Key improvements" in code explanations or documentation.
claude-4-5-sonnet
)↵```↵↵Key improvements and explanations:↵↵*
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 8985
This neuron detects the summary header “Key improvements and explanations.”
o4-mini
)↵```↵↵Key improvements and explanations:↵↵*
GEMMA-3-27B-IT
31-GEMMASCOPE-2-RES-262K
INDEX 8985
the word "soldiers" or references to military personnel/combatants.
claude-4-5-sonnet
significant clash than a skirmish, but still not a
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 9410
This neuron is looking for numeric information (tokens representing numbers, measurements, or counts).
o4-mini
significant clash than a skirmish, but still not a
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 9410
the neuron detects words and tokens related to military conflict and battle (e.g., battle types, troops, campaigns, and combat actions).
gpt-5-mini
significant clash than a skirmish, but still not a
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 9410
tokens that appear in error messages, logs, or I/O/diagnostic output (e.g., "Error", response/status/read operations).
gpt-5-mini
fmt.Println("Error reading response body:", err
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 10132
It fires on prominent topical/content words and key informative nouns (tokens that mark main ideas or important concepts in a passage).
gpt-5-mini
ate." It refers to individuals, primarily men, who
GEMMA-3-27B-IT
53-GEMMASCOPE-2-RES-262K
INDEX 46239