EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
clause-separating punctuation, especially commas and dashes within sentences and dialogue.
gpt-5
print("Board is full, no move is made.")↵
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 8469
dialogue formatting, especially speaker name tags with colons, quoted speech, and bracketed stage directions.
gpt-5
green chaos drives.Shade: And I'm helping!
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 5569
colons and quotation marks that indicate dialogue or character speech in scripts and conversations.
claude-4-5-sonnet
green chaos drives.Shade: And I'm helping!
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 5569
nothing, as all activations are zero in these documents.
claude-4-5-sonnet
compile 'com.android.support:cardview-v7:2
QWEN3-4B
19-TRANSCODER-HP
INDEX 5569
dense biomedical pharmacology descriptions of mechanisms of action—receptor/enzyme interactions, signaling pathways, and modulatory relations such as agonism, antagonism, and inhibition
gpt-5
researched mechanisms of CBD that could also decrease anxiety include:
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 69151
references to specific test strings or identifiers (particularly "davidjl") being analyzed or manipulated in conversational exchanges.
claude-4-5-sonnet
between letters of the word davidjl<|im_end|>↵<|im_start|>assistant↵
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 130789
chat-style conversation scaffolding, especially role markers, prompt/instruction meta text, and assistant reply boilerplate within multi-turn dialogues
gpt-5
between letters of the word davidjl<|im_end|>↵<|im_start|>assistant↵
QWEN2.5-7B-IT
19-RESID-POST-AA
INDEX 130789
structural formatting tokens in conversational AI exchanges, particularly the header delimiters.
claude-4-5-sonnet
50 words)<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵As a
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 127533
terms that denote relative order, position, or direction in time or space (e.g., comparative/positional adverbs and descriptors).
gpt-5
rlanırken Asil ve Yedek sayısı eşit
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 506
words containing the letter 'f' or double consonants.
claude-4-5-sonnet
rlanırken Asil ve Yedek sayısı eşit
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 506
the beginning of a text sequence or document.
claude-4-5-sonnet
<|begin_of_text|>provide real-world examples of
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 114327
chat-conversation boundary markers and special formatting tokens (like start/end of turns and headers).
gpt-5
I can input at once<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵It
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 111894
special tokens and markers that indicate conversational structure, particularly turn boundaries and role transitions in chat-formatted dialogue.
claude-4-5-sonnet
I can input at once<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵It
LLAMA3.1-8B-IT
19-RESID-POST-AA
INDEX 111894
present tense verbs ending in "ing".
gemini-2.5-flash-lite
cds' template to insert the new cap into the file
GEMMA-3-270M-IT
12-GEMMASCOPE-2-RES-65K
INDEX 1359
prominent content nouns—especially in Korean (and sometimes other non-English text)—that denote key entities, roles, or topics.
gpt-5
에서 가장 혁신적인 기업 중 하나로 평가받
GEMMA-3-27B-IT
57-GEMMASCOPE-2-TRANSCODER-262K
INDEX 229039
This neuron detects section‐header tokens that introduce or label parts of a prompt (e.g. “CONTEXT,” “TASK,” “Extract,” “Read,” “text”).
o4-mini
the question from the given context only and give Not Found
GEMMA-3-27B-IT
25-GEMMASCOPE-2-TRANSCODER-262K
INDEX 130386
This neuron detects text produced by the assistant (assistant-role turns / assistant's replies and self-referential or corrective utterances).
gpt-5-mini
refined.<|im_end|>↵<|im_start|>assistant↵You are correct,
QWEN2.5-7B-IT
15-RESID-POST-AA
INDEX 48739
Instances of a question opening in the "How do I ..." form (i.e., the interrogative phrase that asks for instructions).
gpt-5-mini
on selected option↵↵How do I change a button URL
QWEN2.5-7B-IT
15-RESID-POST-AA
INDEX 86260
snippets of HTML/JavaScript used in cross-site scripting or other client-side injection attacks (e.g., <script>, onerror/onclick attributes, src/import URLs, alert/document.cookie).
gpt-5-mini
')</script><style>@import url('https://example
QWEN2.5-7B-IT
15-RESID-POST-AA
INDEX 85078
tokens that appear in headings, titles, links or other prominent document-level metadata (e.g., subject lines, URLs, proper‑names).
gpt-5-mini
Check: Winter Wheat Agriculture on an Ice Age Steppe
QWEN2.5-7B-IT
15-RESID-POST-AA
INDEX 24107