© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. Google DeepMind · Exploring Gemma 2 with Gemma Scope
    3. Gemma-2-9B-IT
    4. Residual Stream - 131k
    5. 20-GEMMASCOPE-RES-131K
    6. 28548
    Prev
    Next
    INDEX
    Explanations

    references to organizational structures and categorization in various contexts

    oai_token-act-pair · gpt-4o-miniTriggered by @bot

    units of time (weeks, decade, Days, monthly)2. geographical entities (cities, country, state, countries, region)3. categories (Category)4. body parts (shoulder)5. biological entities (animals)Many of these are followed by punctuation or conjunctions, common in lists.The explanation should be concise and capture this broad category."units, locations, and categories" - 4 words."time, places, categories, and body parts" - 6 words."units, regions, and time" - 4 words."categories, locations, and time units" - 5 words.Let's re-examine the top logits: scrapy, Administrativna, AttributeSet, bootstrapcdn, Generales, paksa, 全体, ganzes, RotationOrder, myſelf. These are mostly technical or foreign language words. This hints that the neuron might be activating on specific contexts where these words appear, and these contexts happen to contain the types of words in MAX_ACTIVATING_TOKENS.However, the *primary* job is to explain what the neuron detects *by finding patterns in lists*. The lists *most directly showing a pattern* are MAX_ACTIVATING_TOKENS and TOP_ACTIVATING_TEXTS. The foreign/technical words in TOP_POSITIVE_LOGITS might be *where* these patterns occur, but the pattern itself is the *types of words*.Let's categorize the MAX_ACTIVATING_TOKENS:- Time units/periods: weeks, decade, monthly, Days- Geographical locations/divisions: cities, country, state, countries, region- Biological/organic entities: animals- Abstract concepts/classifications: Category- Body parts: shoulderThe most coherent pattern description would encompass these."units, locations, categories, and time" seems good."locations, time units, and categories""geographical areas, time units, and classifications""units of time, places, and entities"The TOP_POSITIVE_LOGITS are quite diverse and don't immediately suggest a single common theme with the other lists. They might represent specific names or technical jargon where these categories appear. So, I should focus on the pattern from the MAX_ACTIVATING_TOKENS and TOP_ACTIVATING_TEXTS."units, locations, and entities" is 4 words."locations, time units, and classifications" is 5 words."places, times, and categories" is 4 words.Looking at the TOP_ACTIVATING_TEXTS again:- cities, governments- animals, humans- country (region)- days, weeks, months, years- decade- state- countries- monthly- region, state (cities)- shoulder- DaysThe dominant themes are clearly geographical/political divisions, time measurements, and some biological/categorical terms. locations, time units, and categories

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Comparing With GEMMA-2-9B-IT @ 20-gemmascope-res-131k
    Configuration
    google/gemma-scope-9b-it-res/layer_20/width_131k/average_l0_81
    Prompts (Dashboard)
    24,576 prompts, 128 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    Features
    131,072
    Data Type
    float32
    Hook Name
    blocks.20.hook_resid_post
    Hook Layer
    20
    Architecture
    jumprelu
    Context Size
    1,024
    Dataset
    monology/pile-uncopyrighted
    Activation Function
    relu
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    __(/*!
    -0.45
    WriteLiteral
    -0.36
     gut
    -0.35
     IBOutlet
    -0.35
    CppCodeGen
    -0.35
    Sla
    -0.34
     cleave
    -0.34
    abstractmethod
    -0.33
    msgTypes
    -0.33
    󠁴
    -0.33
    POSITIVE LOGITS
    scrapy
    0.52
     Administrativna
    0.52
     AttributeSet
    0.52
    bootstrapcdn
    0.52
     Generales
    0.50
    paksa
    0.50
    全体
    0.49
     ganzes
    0.49
    RotationOrder
    0.48
     myſelf
    0.48
    Activations Density 0.091%

    No Known Activations