EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
substantive model responses and explanations, particularly longer passages with detailed technical or instructional content.
claude-4-5-haiku
speeds and directions with height in the clouds"↵<end_of_turn>
GEMMA-3-4B-IT
29-GEMMASCOPE-2-RES-65K
INDEX 1832
tokens related to structured data formatting, field separators, and punctuation that denotes hierarchical organization in complex documents.
claude-4-5-haiku
except for the last one:↵{{↵"Python
GEMMA-3-4B-IT
29-GEMMASCOPE-2-RES-65K
INDEX 3999
detailed step-by-step instructions and comprehensive explanations.
claude-4-5-haiku
on Hugging Face Hub. It tells `fast
GEMMA-3-4B-IT
29-GEMMASCOPE-2-RES-65K
INDEX 1263
tokens that are numeric values (especially floating-point or measurement-style numbers).
gpt-5-mini
image = torch.randn(3, self.image
GEMMA-3-4B-IT
29-GEMMASCOPE-2-RES-65K
INDEX 5881
instructions specifying JSON output formatting (especially the "Output everything in the following JSON object" phrase and related result-variable/field-format rules).
gpt-5-mini
except for the last one:↵{{↵"Python
GEMMA-3-4B-IT
29-GEMMASCOPE-2-RES-65K
INDEX 3999
words related to warnings, disclaimers, and formal instructional language (such as "do not," "errors," "yet," and "All").
claude-4-5-haiku
p>Fun fact: this week Time Out is the
GEMMA-2-2B
20-GEMMASCOPE-RES-16K
INDEX 15729
the word "Force" when it appears at the beginning of a sentence or as part of a proper noun or technical term.
claude-4-5-sonnet
es has returned application/force-download as the content
GEMMA-3-12B
41-GEMMASCOPE-2-RES-262K
INDEX 240835
narrative text indicating first-person perspective or character actions, particularly in role-play, dialogue, or story contexts.
claude-4-5-sonnet
re right,” I said, a small, genuine smile
GEMMA-3-4B-IT
25-GEMMASCOPE-2-TRANSCODER-262K
INDEX 8576
Based on the activation patterns across all the text samples, this neuron activates strongly on **first-person narrative perspective and introspective emotional states**, particularly when characters are processing complex feelings, memories, or moments of vulnerability.
The neuron shows high activations on pronouns like "I
claude-4-5-haiku
re right,” I said, a small, genuine smile
GEMMA-3-4B-IT
25-GEMMASCOPE-2-TRANSCODER-262K
INDEX 8576
phrases that explicitly frame something in historical context or refer to long-term origins and continuity
gpt-5
a variety of reasons — historically lower incomes, higher unemployment
GPT-OSS-20B
7-RESID-POST-AA
INDEX 2007
dates and year numbers within historical or academic texts.
claude-4-5-haiku
Routes, 1735–1815. Bount
GPT-OSS-20B
3-RESID-POST-AA
INDEX 2001
words related to emergency dispatchers and 911 dispatch operations.
claude-4-5-haiku
County Sheriff’s Office, dispatchers received a call at
GPT-OSS-20B
3-RESID-POST-AA
INDEX 12219
Chinese proximal demonstratives and simple numeral markers, especially when introducing noun phrases or section/list headings.
gpt-5
1] = 进入这间宽��明
GPT-OSS-20B
3-RESID-POST-AA
INDEX 3001
Lines marked as additions in a diff/patch (the leading "+" that indicates an added line).
gpt-5-mini
server = NULL;↵+ buf_free (
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 20691
Mentions of the female reproductive cycle and hormone-related reproductive conditions.
gpt-5-mini
, or phase of menstrual cycle) and consequently it is
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 7327
tokens that appear in structured code/config/metadata lines — i.e., labels and colon-separated key/value markers (like fileID, Script, Editor, Prefab, and the colon).
gpt-5-mini
m_EditorHideFlags: 0↵ m
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 113225
short uppercase alphabetic tokens — acronyms or initials (e.g., two‑letter/abbreviated scientific or name initials).
gpt-5-mini
_JUNIPER_MLFR = 0
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 82116
The neuron detects markers that indicate an answer or reply/closing in a post (e.g., "A:", "Thanks", and similar reply/closing tokens).
gpt-5-mini
class?↵↵Thanks.↵↵A:↵↵You need
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 70614
Finds first- and second-person expressions of agency (I/you), especially words indicating requests, intent, ability or actions.
gpt-5-mini
manufacturers, explain that you choose only the best components of
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 92322
phrases and tokens related to health, safety, caregiving, and practical advice (medical/medical-adjacent situations).
gpt-5-mini
out of your control when driving. These can include such
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 88954