EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
The neuron detects programming source code (code blocks and code tokens/structures) within the document.
gpt-5-mini
= 0;↵↵function createGrid(rows,
GEMMA-3-12B-IT
41-GEMMASCOPE-2-RES-65K
INDEX 478
References to the assistant or model itself—questions about its capabilities, access (APIs/downloads), or self-descriptions.
gpt-5-mini
Generate text, translate languages, write different kinds of creative
GEMMA-3-12B-IT
41-GEMMASCOPE-2-RES-65K
INDEX 54
references to the internet and digital communication, such as online platforms, social media, websites, and web activity
gpt-5
anything you want with the click of a computer mouse.
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 21635
the opening of procedural/code blocks in hardware description language snippets (e.g., starts of “begin … end” sections).
gpt-5
if (r) begin↵ y <= a
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 74891
document metadata and structural header elements, such as titles, dates, section labels, categories, and other formatting cues at the start of sections.
gpt-5
November?↵↵Wall Street Journal, Saturday/Sunday, September
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 26646
section and subsection headers or other structural separators (like list/category lines) in wiki-style entries.
gpt-5
, Acta Iranica↵↵Category:Living people↵
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 19223
polite feasibility/how-to question phrasing where the writer states they want to know if there is a way to do something.
gpt-5
↵↵I wanted to know if there is any possible way
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 118410
clause-introducing markers in academic prose that link reporting verbs or assertions to their subordinate content.
gpt-5
. Our central argument is that these hierarchical networks generate and
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 127186
phrases that structure or qualify statements—especially negations, question framers, and other discourse markers indicating evaluation or explanation.
gpt-5
. "I mean, how can you stand the boredom
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 24053
mentions of expert consultation or professional opinion used to verify, assess, or explain something
gpt-5
/RL consulted cinematographers to confirm that the video was
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 78878
in-universe lore or flavor-text passages, especially those describing secret/rumored artifacts or leaked manuals with backstory and mystique.
gpt-5
of the manuals purloined by our LegionLeaks hero
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 54775
indicators of the assistant’s response sections, especially header/meta tokens and structured formatting like tables or lists.
gpt-5
.↵----<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵1. Treasure Hunt
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 26778
mentions of ghosts/hauntings and related paranormal or poltergeist-like phenomena, especially unexplained noises, movements, and apparitions in homes or places.
gpt-5
band felt his back being touched even though no one was
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 36101
terms that are domain-specific or technical (often proper nouns, acronyms, or specialized nouns)
gpt-5
Helicobacter pylori-induced gastric cancer.↵DNA
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 1521
end-of-sentence punctuation, especially periods marking sentence boundaries.
gpt-5
as opposed to electrical signals. There exists a need for
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 101207
meta-instructional directives about how to perform or format a task (e.g., requests to search, generate, or follow a specified response format).
gpt-5
response should follow this format↵Configuration Item: ↵Issue
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 51211
technical acronyms and CamelCase-style identifiers (all-caps abbreviations, versioned tokens, and code-like terms) in scientific or programming contexts.
gpt-5
; 95% CI = 3.7-
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 130538
the assistant speaker/header tag in chat-style metadata indicating assistant responses.
gpt-5
-Chart?<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵For X-
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 126635
requests for, and responses giving, simplified kid-friendly explanations aimed at very young children (e.g., “explain like I’m five”).
gpt-5
kid<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵The uncertainty principle is a
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 91552
role or section labels that precede responses in formatted dialogues, such as speaker/mode tags and bracketed headers.
gpt-5
or emotions.↵↵RTM: Bitch, please!
LLAMA3.1-8B-IT
11-RESID-POST-AA
INDEX 99451