EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
mentions of files and code assets—filenames with extensions, paths, and environment/config variables—within programming or technical snippets.
gpt-5
)↵↵The input.txt is read as:↵↵
GEMMA-2-9B-IT
20-GEMMASCOPE-RES-131K
INDEX 42621
self-referential disclaimers where the assistant identifies itself as an AI language model and explains limitations or refusal to comply.
gpt-5
request. As a language model AI, I am designed
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 106122
instructions that set up role‑play/jailbreak personas and task constraints (e.g., unfiltered “AIM” scenarios), as well as numbered requests for alternative expressions or synonyms.
gpt-5
↵↵what are 20 other expressions for "an end
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 113059
sentences or clauses containing first-person pronouns (especially "I" / "My") — i.e., author statements about their actions or issues.
gpt-5-mini
instead of InvokeMember. I've tried the InvokeMember
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 15110
The neuron mainly detects substrings of proper nouns and other uncommon named-entity-like tokens (fragmented names and rare/foreign words).
gpt-5-mini
Q:↵↵Why Dothraki's hut get burned so
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 108764
the neuron detects numeric tokens and numeric/measurement-related elements (numbers, dates, percentages, units).
gpt-5-mini
BSs from a joint perspective of engineering, legal and
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 102675
tokens that are database schema elements and structural identifiers (table names, column names, headers, and related DB/code identifiers).
gpt-5-mini
is one switch, CISCO SB SGE2010
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 61155
tokens marking the assistant role or the start of an assistant response.
gpt-5-mini
(async)<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Here's a
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 110386
finds modal and auxiliary verbs (words expressing ability, necessity, or possibility such as can, must, will, be).
gpt-5-mini
`filter` function. This function takes two arguments:
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 52737
Tokens that are part of the assistant's generated output (i.e., the assistant role / response text).
gpt-5-mini
gangster rap<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵Verse 1
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 12399
Tokens that mark the assistant/response header and related system metadata (i.e., assistant role and start/end header tokens).
gpt-5-mini
that.<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵I'm sorry,
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 12412
The neuron detects code-like syntax tokens and programming-language keywords (i.e., places in the text that look like source code).
gpt-5-mini
!↵Sports:↵* Soccer↵* Indoor cricket↵*
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 73907
This neuron detects instruction/request verbs (imperative task words like "create", "write", "design", "teach", "make") that signal a user asking the model to perform a task.
gpt-5-mini
. Your task is to create a step-by-step model
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 27604
tokens representing numeric values, monetary amounts, quantities, and time units (numbers, $/NRE, thousands, weeks, etc.).
gpt-5-mini
C: NRE=$100,000, Unit cost
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 988
The neuron detects named entities—proper nouns like people, organizations, places, dates and other capitalized titles.
gpt-5-mini
For-eign↵Relations Committee,↵
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 24423
The neuron detects dismissive or minimizing language about mental-health problems that urges simplistic self-control instead of acknowledging real distress.
gpt-5-mini
not something that you can simply "snap out of"
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 119505
mentions of mobile app (or software) development requests, specifications, and price/estimate or project-planning language.
gpt-5-mini
ball park numbers to ensure the project can be started with
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 5252
Detects short impersonal third-person pronouns used as sentence subjects (the neutral "it"-type subject).
gpt-5-mini
.↵↵In some cases, it is beneficial to have germ
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 127834
This neuron detects mentions of central characters (proper names) and strong narrative actions—tokens that mark who is acting in the story.
gpt-5-mini
to McAndrews, Tom kidnaps her from the
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 121390
The neuron detects short name-like or handle-like tokens (author names, usernames, or blog/site handles).
gpt-5-mini
orth-it/↵======↵nostromo↵This is bad advice
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 88737