EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
The neuron detects user query turns—that is, lines where the user asks a question.
o4-mini
and 256-bit key sizes.
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 1520
It detects first-person self-statements—phrases where the speaker (I/I'm/I will) announces or reports their own actions, status, intentions, or requests.
gpt-5-mini
notification that I will be taking sick leave today, [
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 2499
mentions of the current U.S. President (Joe Biden) and tokens indicating currentness or specific dates/times.
gpt-5-mini
the United States is **Joe Biden**. He assumed office
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 8312
tokens representing numbers, dates, years, or other numeric/factual metadata (including turn/start markers).
gpt-5-mini
is the first woman to hold that office.
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 31225
This neuron detects numeric tokens—numbers, years, scores, and other digit-containing tokens in the text.
gpt-5-mini
fields (typically 1.5 to 10
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 7216
the neuron detects question/request turns — it fires on tokens that appear in user queries asking for factual information.
gpt-5-mini
and 256-bit key sizes.
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 1520
This neuron detects date expressions—especially month names and nearby numeric tokens that form calendar dates.
gpt-5-mini
update for today, **November 3, 2
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 1184
tokens from the model/assistant's reply—especially self-referential or help/clarification phrases (the assistant speaking).
gpt-5-mini
you too! 👋 It's lovely to hear from
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 8539
Detects when the model/assistant is producing a long, structured response—activating on tokens that mark assistant-generated content (introductions, headings, list or reply-openers).
gpt-5-mini
in the world<end_of_turn>↵<start_of_turn>model↵Okay, compiling
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 586
The neuron detects date-related tokens (months, days, years, and other tokens used in dates).
gpt-5-mini
<bos><start_of_turn>user↵In January, a garden center contacted
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 3896
The neuron detects numeric timestamp‑style tokens (numbers with decimal/float-like values often appearing after dates or time markers).
gpt-5-mini
, 2023**. I'm
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 16928
finds numeric tokens (years, dates, and other numbers).
gpt-5-mini
September 2021**. I don't
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 4823
The neuron detects temporal references — months, seasons, and explicit date/time mentions (e.g., "May", "spring", "mid‑May 2024").
gpt-5-mini
next week (mid-May 2024
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 6114
tokens that are numeric timestamps or high-precision numeric date/time markers.
gpt-5-mini
, 2013:<end_of_turn>↵<start_of_turn>model
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 2942
mentions of specific years and dated time markers, especially recent calendar years and fiscal year labels (e.g., FY-year).
gpt-5
February 2024, succeeding Mike Ybarra
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 38411
calendar dates and date/time calculations or recurrence-rule syntax.
gpt-5
50 days after July 18th,
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 44652
references to specific calendar years and dated time markers in text.
gpt-5
In 2013, Jagex recognized
GEMMA-3-12B-IT
24-GEMMASCOPE-2-RES-262K
INDEX 3111
The neuron activates on numeric tokens representing floating‐point values (e.g. “650.4088”)
o4-mini
, 2023. **Please read
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 6364
The neuron primarily detects four‐digit year tokens (e.g. “2015,” “2023”) in the text.
o4-mini
:** 2015↵* **Label
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 6402
This neuron detects year-like numeric tokens (four-digit years / date numbers) in the text.
gpt-5-mini
" (1999) establece el mundo complejo
GEMMA-3-12B-IT
12-GEMMASCOPE-2-RES-262K
INDEX 1135