© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. Google DeepMind · Exploring Gemma 2 with Gemma Scope
    3. Gemma-2-9B-IT
    4. Residual Stream - 131k
    5. 20-GEMMASCOPE-RES-131K
    6. 535
    Prev
    Next
    INDEX
    Explanations

    narrative elements related to growth and progress

    oai_token-act-pair · gpt-4o-miniTriggered by @bot

    action verbs and following words"},- **MAX_ACTIVATING_TOKENS**: implemented, used, be, begin, attempted, why, initiated, style, publish, tries, ate, development- **TOKENS_AFTER_MAX_ACTIVATING_TOKEN**: this, ', using, pulling, to, we, on, and, Nigerian, to, ,, of- **TOP_POSITIVE_LOGITS**: Kilder, ICONTAINER, tallas, Erreferentziak, utafitiHapana, pratiques, Erscheinung, diário, cortinas, jeito- **TOP_ACTIVATING_TEXTS**: - "...implemented this AML Policy..." - "...I used 'we' language..." - "...begin pulling your positions..." - "...They attempted to modernise..." - "...research is to attempt to define..." - "...Work has been initiated on..." - "...leadership style and how s/he organizes..." - "...us to publish Nigerian women’s take..." - "...it tries to accomplish in large part..." - "...I ate, slept, and breathe..." - "...motivated the development of doubly robust..."The `MAX_ACTIVATING_TOKENS` are verbs, often indicating the start/completion of an action or a process. The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` list shows diverse following words, but the `TOP_ACTIVATING_TEXTS` examples show a pattern where these verbs are followed by prepositions ('this', 'to', 'on', 'of', 'and'), indicating what the action applies to or is connected with. The `TOP_POSITIVE_LOGITS` are international words related to categories, references, or concepts, suggesting the neuron might be activating in contexts where specific actions are described in relation to broader categories or topics found in diverse texts.Considering the verbs themselves (implemented, used, begin, attempted, initiated, publish, tries, ate, development) and their common follow-ups ('this', 'to', 'on', 'and', 'of'), the neuron seems to capture descriptions of actions, processes, or states *initiating* or *continuing* in various contexts. The foreign words in TOP_POSITIVE_LOGITS might indicate the neuron is general enough to activate across different languages or domain-specific texts that use such categorizations.A good concise explanation would capture the essence of 'action' and 'context/continuation'.Checking the common pattern:- implemented **this**- used **'we'** (pronoun follows)- begin **pulling** (gerund follows)- attempted **to**- attempt **to**- initiated **on**- style (noun, followed by 'and')- publish **Nigerian** (proper noun follows)- tries **to**- ate **,** (punctuation follows)- development **of**There's a mix of what follows. However, the `MAX_ACTIVATING_TOKENS` themselves are the strongest signal. They are verbs that signify undertaking an action, a process, or a state of being/progress.Let's re-examine the examples:- implemented this AML Policy -> action + object- used 'we' language -> action + object- begin pulling -> action + gerund- attempted to modernise -> action + infinitive- attempt to define -> action + infinitive- initiated on -> action + preposition- leadership style -> noun, but 'style' is in MAX_ACTIVATING_TOKENS. The context is about leadership style.- publish Nigerian women’s take -> action + modifier + object- tries to accomplish -> action + infinitive- ate, slept -> verb + verb- development of -> noun + prepositionThe verbs in MAX_ACTIVATING_TOKENS point towards *acts of doing*, *processes*, or *starting/completing*. The TOP_POSITIVE_LOGITS are diverse, but many relate to categorization or metadata ('references', 'practices', 'appearance', 'sources'). This suggests the neuron might be looking for how actions are categorized or described in different contexts.If we focus on the verbs themselves and how they are used:- implemented- used- begin- attempted- initiated- publish- tries- ate- developmentThese are all verbs describing actions or states that are being *performed* or *undergone*. The context in TOP_ACTIVATING_TEXTS often describes *how* something is done, *what* is done, or *why* it is done.Consider the foreign words: Kilder (Sources), Erreferentziak (References), pratiques (practices), diário (daily). These often relate to descriptions, reports, or ongoing activities.Let's try to find a pattern that encompasses these verbs and their typical context."uses verbs about undertaking actions" - too long."verbs about performing actions" - okay."actions and their context" - a bit abstract.What if it's about *how things function* or *how they are being done*? 'Style', 'development', 'implemented', 'used', 'initiated', 'began' all point to this.Let's look at the combination of a verb and its immediate aftermath. While not uniform, the verbs often lead into a noun phrase, an infinitive phrase, or a prepositional phrase specifying the action."verbs indicating process or implementation" might be good."actions being performed" is also fitting."descriptions of actions"Let's look at the TOP_POSITIVE_LOGITS again. `pratiques` (practices), `jeito` (way/manner), `Erscheinung` (appearance). These relate to how things are done or appear.Combined with `implemented`, `used`, `inititated`, `developed`, `attempted`, `tries`, `publish`, `began` - it's about how things are *done*, *started*, or *made*."descriptors of actions and processes

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Comparing With GEMMA-2-9B-IT @ 20-gemmascope-res-131k
    Configuration
    google/gemma-scope-9b-it-res/layer_20/width_131k/average_l0_81
    Prompts (Dashboard)
    24,576 prompts, 128 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    Features
    131,072
    Data Type
    float32
    Hook Name
    blocks.20.hook_resid_post
    Hook Layer
    20
    Architecture
    jumprelu
    Context Size
    1,024
    Dataset
    monology/pile-uncopyrighted
    Activation Function
    relu
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    :✨
    -0.66
    MigrationBuilder
    -0.57
    webElementXpaths
    -0.52
    Personendaten
    -0.51
    #+#
    -0.51
    KommentareTeilen
    -0.46
     esternos
    -0.46
    astify
    -0.46
    ()].
    -0.46
    PCL
    -0.45
    POSITIVE LOGITS
    Kilder
    0.45
    IContainer
    0.42
     tallas
    0.39
    Erreferentziak
    0.39
     utafitiHapana
    0.39
     pratiques
    0.38
     Erscheinung
    0.36
     diário
    0.36
     cortinas
    0.36
     jeito
    0.36
    Activations Density 0.047%

    No Known Activations