EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
whether the text is an interrogative/question (i.e., signals a question being asked).
gpt-5-mini
↵"Q Okay, did youhow many drinks
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 30556
section headings, list titles, and other prominent document headings (e.g., numbered/topical or all-caps titles) in instructional or advice text.
gpt-5-mini
fully present!↵SIX KEYS TO STRONG EMOTIONAL
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 107672
Sentences or headings in docs that indicate contribution or how-to/installation instructions (e.g., "Contributing", "Running", "Create your feature branch").
gpt-5-mini
}↵```↵↵## Contributing↵↵1. Fork it
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 38630
whether the text is offering or requesting assistance — i.e., expressions of help/assistance.
gpt-5-mini
authorities to help in every way they can and use every
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 34588
tokens that are part of mathematical/LaTeX-style expressions (equations, operators, formulas and math-mode constructs).
gpt-5-mini
t+1},\tau))|x_{t},
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 45895
locations that mark the end of a turn or sentence (sentence/turn boundaries).
gpt-5-mini
members and to the future<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 64849
text that looks like Q&A/forum interface elements or metadata (e.g., "Best answer", "Ask a Question", usernames/tags, posting info).
gpt-5-mini
hard disks? – Fixya↵↵Near the<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 121939
sentences that express opinions, evaluations, recommendations, or other subjective judgments.
gpt-5-mini
are most at-risk."↵↵Areas of focus here
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 1580
the special end-of-turn marker token indicating the end of a turn in the transcript.
gpt-5-mini
Best Fan Artist↵↵*<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 13088
Mentions of interview events and the act of being interviewed (questions, interviewer interactions, and related hiring/offers).
gpt-5-mini
asked me a few standard questions about education, teaching,
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 17781
the presence of interrogative (WH) question words/phrases indicating a question.
gpt-5-mini
Devil really want, and how far will Clarice go
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 76976
Lines of direct speech or quoted dialogue (speaker attributions and tokens inside quotations).
gpt-5-mini
to the refrigerator."↵At one time during the
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 96507
mentions of feelings, emotions, consciousness, or first‑person statements about experiencing (or lacking) them.
gpt-5-mini
I don't have feelings, consciousness, or personal
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 4765
It detects words that signal similarity, relatedness, or membership in a group/category (e.g., related, similar, family, variety).
gpt-5-mini
well.↵For very similar reasons, async and await
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 98468
tokens that are standalone numbers (especially decimal or floating-point numerals) in the math/technical text.
gpt-5-mini
4ex2}<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 108785
The neuron detects sentence- or clause-initial capitalized content words (tokens that begin sentences, headings, or major clause transitions).
gpt-5-mini
herself and Sherryl. Sherryl argues that this finding
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 5219
sentences or phrases stating appellate conclusions about trial-court error and remedies (claims that the trial court erred and resulting holdings like reversal/remand or alternative relief).
gpt-5-mini
the agreement. In the alternative, defendants seek an order
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 24117
mentions of robots, androids, artificial intelligence, or machine/robotic agents.
gpt-5-mini
to change what Pepper the robot says when it is in
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 7824
the main finite verb or auxiliary (the predicate) of a clause/sentence.
gpt-5-mini
<bos> many micrometers are there in one tenth of a
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 55364
The neuron detects document-structure and markup tokens indicating section boundaries, headings, links, and other formatting/metadata.
gpt-5-mini
six worst states for LGBTQ<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 82458