EXPLANATION TYPE
oai_token-act-pair
Description
OpenAI's Automated Interpretability from paper "Language models can explain neurons in language models". Modified by Johnny Lin to add new models/context windows.
Author
OpenAI
URL
https://github.com/hijohnnylin/automated-interpretabilitySettings
Default prompts from the main branch, strategy TokenActivationPair.
Recent Explanations
descriptions of serious bodily injury, especially in assault contexts, along with the resulting hospitalization, treatment, or legal ramifications.
gpt-5
so hard I broke his nose. Of course, he
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 89676
opening quotation marks that signal the start of a direct quote or reported speech.
gpt-5
examine that group.↵↵"We need to investigate alternative
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 50889
descriptions of physical hazards and injuries, especially accidents or animal attacks, and references to industrial safety/guarding measures that prevent them.
gpt-5
/TO) and machine guarding.↵↵Focus on Fundamentals
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 108839
mentions of domestic or gender-based violence and associated responses—such as units, cases, survivors, and support services like shelters, advocacy, safety planning, and legal protections.
gpt-5
), the DOMESTIC VIOLENCE UNIT, and the SEX
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 97954
distinctive proper nouns and named entities (uncommon names of people, places, organizations, brands, or acronyms).
gpt-5
precancers respectively. [unreadable] [unreadable
LLAMA3.1-8B-IT
15-RESID-POST-AA
INDEX 58233
Mentions of dogs, especially descriptions of dog body language, physical parts, and behavior in training or tracking contexts.
gpt-5-mini
tighten, and their tail carriage and ear positions will
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 20935
Detects when a text describes an entity (especially non-human or artificial) as sentient, conscious, or otherwise an animate/agent-like being.
gpt-5-mini
and the ship’s AI unresponsive. Cait works
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 41648
the neuron detects contracted/clitic tokens (apostrophes and the pieces of contractions like 'll, n't, 's, etc.).
gpt-5-mini
-making time. You’ll have to choose what
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 22150
numeric tokens and digit sequences in the text.
gpt-5-mini
stenJS:2014], which is beneficial
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 88563
the neuron detects salient content words / topic nouns — prominent subject nouns or concept keywords in the text.
gpt-5-mini
knowing/discovering your purpose saves you so much time
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 31885
standalone decimal numbers (floating-point values) appearing as isolated numeric tokens.
gpt-5
.↵-1<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 13364
the neuron activates for prominent topical content words (key nouns or subject words) in the text.
gpt-5-mini
How to Write a Handwritten Note↵Only three or
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 108278
the presence of apostrophes—tokens that are part of contractions or possessives (like 's, n't, 'm).
gpt-5-mini
me another difference. I'm<end_of_turn>↵
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 103702
mentions of alcohol, alcohol use disorder, and related treatments/biomedical terms (e.g., disulfiram, naltrexone, blood ethanol/acetaldehyde).
gpt-5-mini
rosate or disulfiram*Program Name and
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 124345
words that introduce a method, means, or instrument (e.g., prepositions like "by", "using", "through", "with" that signal how something is done).
gpt-5-mini
above, can be manipulated through direct silvicultural treatments
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 77441
The neuron activates on verbs describing actions (especially past-tense and present-participial forms like "developing," "studied," "taking," "started," "ran/ran").
gpt-5-mini
sson↵↵I have been developing a technology which best can
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 50875
The neuron detects salient content words — topic-bearing or focus words (important nouns, verbs, and adjectives) that carry the main meaning of a sentence.
gpt-5-mini
Epigenetic modifications play an important role during normal development
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 74379
the neuron detects numeric tokens and quantity-related tokens (numbers, digits, percentages, and similar numeric expressions).
gpt-5-mini
Zambia has been without any kind of game↵department at
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 32059
the start of a spoken utterance — an opening quotation mark or the beginning of direct dialogue.
gpt-5-mini
you read it.”↵↵“Of course I read it
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 120736
mentions of firefighting or extinguishing — references to fires, firefighting personnel, equipment, systems, or actions to put fires out.
gpt-5-mini
Temp Sensors↵↵best fire extinguishers buyers guide, wall
GEMMA-2-9B-IT
9-GEMMASCOPE-RES-131K
INDEX 6407