EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
phrases that explicitly frame something in historical context or refer to long-term origins and continuity
gpt-5
a variety of reasons — historically lower incomes, higher unemployment
GPT-OSS-20B
7-RESID-POST-AA
INDEX 2007
dates and year numbers within historical or academic texts.
claude-4-5-haiku
Routes, 1735–1815. Bount
GPT-OSS-20B
3-RESID-POST-AA
INDEX 2001
words related to emergency dispatchers and 911 dispatch operations.
claude-4-5-haiku
County Sheriff’s Office, dispatchers received a call at
GPT-OSS-20B
3-RESID-POST-AA
INDEX 12219
Chinese proximal demonstratives and simple numeral markers, especially when introducing noun phrases or section/list headings.
gpt-5
1] = 进入这间宽��明
GPT-OSS-20B
3-RESID-POST-AA
INDEX 3001
Lines marked as additions in a diff/patch (the leading "+" that indicates an added line).
gpt-5-mini
server = NULL;↵+ buf_free (
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 20691
Mentions of the female reproductive cycle and hormone-related reproductive conditions.
gpt-5-mini
, or phase of menstrual cycle) and consequently it is
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 7327
tokens that appear in structured code/config/metadata lines — i.e., labels and colon-separated key/value markers (like fileID, Script, Editor, Prefab, and the colon).
gpt-5-mini
m_EditorHideFlags: 0↵ m
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 113225
short uppercase alphabetic tokens — acronyms or initials (e.g., two‑letter/abbreviated scientific or name initials).
gpt-5-mini
_JUNIPER_MLFR = 0
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 82116
The neuron detects markers that indicate an answer or reply/closing in a post (e.g., "A:", "Thanks", and similar reply/closing tokens).
gpt-5-mini
class?↵↵Thanks.↵↵A:↵↵You need
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 70614
Finds first- and second-person expressions of agency (I/you), especially words indicating requests, intent, ability or actions.
gpt-5-mini
manufacturers, explain that you choose only the best components of
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 92322
phrases and tokens related to health, safety, caregiving, and practical advice (medical/medical-adjacent situations).
gpt-5-mini
out of your control when driving. These can include such
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 88954
Verbs and phrases that indicate discussing, explaining, or talking about something (i.e., instances of communication or exposition).
gpt-5-mini
to spend the longest time explaining<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 90445
spots dialogue/turn markers—quoted speech, speaker labels (e.g., "A:", "BD:"), and punctuation that marks questions/answers or turn boundaries.
gpt-5-mini
your expansion to Wellesley?↵↵“Wellesley wanted
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 1015
signal sentence-level discourse/transition markers — words or short phrases that introduce, emphasize, or connect major points (e.g., conclusions, results, or shifts in focus).
gpt-5-mini
has generally remained elusive largely stemming from the fact that many
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 91772
This neuron detects sentence boundaries, firing strongly at the start-of-sentence token and at sentence-final punctuation.
gpt-5-mini
<bos>Lily is a student.
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 22262
mentions of automobile accidents, crashes or collisions and the related insurance/claim context.
gpt-5-mini
the scene of an automobile accident, Whitten v.
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 74936
It detects verbs that mark a clause's predicate—especially auxiliaries and state/ability verbs (e.g., "be", "able", "can", contractions like "'re", and similar forms).
gpt-5-mini
, it is necessary to be able to adjust the center
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 103532
Phrases where the author self-identifies as a beginner/newbie (tokens like "noob", "newbie", "beginner", etc.).
gpt-5-mini
am a total Grails noob trying to configure the db
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 24234
The neuron detects sequence or turn boundaries (tokens marking the end of a turn / end of the document).
gpt-5-mini
these vital processes.<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 27493
It detects section headings and list/numbered-item markers (numbers and words that start list or instruction lines).
gpt-5-mini
user↵<bos> state 3 months ago, and I
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 85760