EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
phrases indicating a response or reaction (e.g., "responded", "response", "in response", often with the following "by").
gpt-5-mini
decreases. Your brain responds by awakening you enough to tighten
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 128246
programming-language code tokens and structural keywords (i.e., parts of source code).
gpt-5-mini
)/p)/p)))/p)/p)*p
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 65537
passages that signal a judge's stance or authorship in a legal opinion — e.g., where a justice announces concurrence, dissent, or speaks in their judicial capacity.
gpt-5-mini
CK, Justice, dissenting.↵I concur in the
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 706
The neuron detects the start-of-sequence (beginning-of-document) token.
gpt-5-mini
<bos>The chiffonier stood a
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 34273
the neuron fires on numeric/math content — tokens that are numbers, math operators/terms, or words introducing calculations (e.g., "Solve", "Convert", numeric values, units).
gpt-5-mini
variables↵↵Recently I was studying something about random matrix theory
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 124611
It detects apostrophe characters (the single-quote/curly quote) used in contractions and possessives.
gpt-5-mini
Clifford Heath:↵↵What's needed is not just
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 14495
This neuron detects numeric tokens — numbers, years, measurements, zip/postal codes and other digit sequences (including decimals).
gpt-5-mini
between 20 to 25 meters in clear
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 22155
Questions asking for advice or instructions framed from the speaker's perspective (tokens like "I", "should", "do", "how should I", "what should I do").
gpt-5-mini
mon password que dois-je faire ? merci↵<
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 35748
the neuron detects numerical tokens and numeric/measurement references (e.g., measurements, years, table/figure/table-number labels).
gpt-5-mini
on the gate oxide film 14. The gate
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 114000
the neuron detects document-structure/formatting tokens (e.g., end-of-turn markers and other markup or boundary tokens).
gpt-5-mini
↵↵(XLSX)<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 81671
It detects negation fragments—roots of negative contractions like aren, wasn, haven, isn, don, didn, hasn (the "n't" parts).
gpt-5-mini
had already started stocking all sorts of special paper offerings,
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 90420
the neuron detects requests or expressions of need/obligation — tokens used when asking for help, stating what should/needs/would be done, or posing a question.
gpt-5-mini
a charm.↵What am I doing wrong with rendering
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 69001
Instances of first-person self-reference — tokens that mark the speaker referring to themselves (personal perspective/possession).
gpt-5-mini
↵<bos>. It took me a while to figure out
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 58028
mathematical notation and formula tokens (LaTeX/math expressions and other math-specific symbols/terms).
gpt-5-mini
e^{e^{-x}}\right)\right)\right
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 24986
the presence of third-person personal pronouns (especially singular forms like "he"/"she"/"he's") referring to people.
gpt-5-mini
apple of your eyes, he or she also needs appropriate
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 83111
sentences or phrases expressing strong emotion or subjective opinion (highly charged, evaluative language).
gpt-5-mini
any difference to julie.↵↵i don’t really
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 124054
tokens that are proper nouns—names of people, authors, taxa, or other titled entities (e.g., surnames, journal/species names).
gpt-5-mini
75-Ayrinhac1]--[@p
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 51047
It detects the start of a quoted utterance (opening quotation marks / the beginning of a dialogue turn).
gpt-5-mini
, not mine.”↵↵“But you made what Mr
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 102781
finds technical, domain-specific tokens and named entities typical of scientific/academic writing.
gpt-5-mini
rus pyrifolia*pectins \[[@B5
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 119209
It detects instructional/advice-oriented content — passages offering tips, how-to guidance, or suggestions.
gpt-5-mini
the many motivational things he said, the single thing I
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 128744