© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Google DeepMind · Exploring Gemma 2 with Gemma Scope
    3. Gemma-2-9B-IT
    4. Residual Stream - 131k
    5. 20-GEMMASCOPE-RES-131K
    6. 86938
    Prev
    Next
    INDEX
    Explanations

    terms associated with biology, health, and pharmaceuticals

    oai_token-act-pair · gpt-4o-miniTriggered by @bot

    cell shape motility drug resistance particle imaging sperm motility digital rightsThis is 8 words. It covers the main patterns observed.Re-evaluating: The instruction is to "find a pattern that helps you explain the behavior of the neuron". It doesn't have to be a list of all observed pairs. I should aim for a more abstract description if possible, or a representative pairing.Let's look at the *types* of things.- wine (noun) -> tasting (gerund/activity)- cell (noun) -> shape, attachment, motility (noun/concept)- drug (noun) -> resistance (noun/concept)- particle (noun) -> imaging (gerund/activity)- arian -> dictatorship (noun/concept)- test (noun/verb) -> data (noun)- function (noun) -> application (noun)- art (noun) -> index (noun)- sperm (noun) -> motility (noun/concept)- digital (adj) -> rights (noun)Many are (Noun) + (Related Concept/Activity).The variety means it's likely detecting specific terminologies from different domains.Considering the instruction "finding patterns in lists".The pattern is that certain keywords appear, and they are often followed by a specific second keyword or concept.Let's try to generalize:"Specific terminology pairings" - too abstract, and mentions 'terminology'."keywords and their context" - might be ok, but still a bit vague.Let's look again at dominant themes in MAX_ACTIVATING_TOKENS:cell (3 times), wine, drug, particle, test, function, art, sperm, digital.The *tokens following* are also varied:shape, attachment, motility, resistance, imaging, dictatorship, data, application, index, rights.The pairing seems to be key."wine tasting", "cell shape", "drug resistance", "particle imaging", "sperm motility", "digital rights".What about "arian" -> "dictatorship"? This is less common."test" -> "data" is common."function" -> "application" is common."art" -> "index" is specific but valid.Maybe the pattern is specific noun + related noun/concept.Alternative simpler explanations:"specific noun-concept pairs" - mentions 'pairs'."domain specific concepts" - too abstract, mentions 'concepts'.Let's focus on the nature of the *tokens*. They are often building blocks of specific phrases or technical terms."wine", "cell", "drug", "particle", "test", "function", "art", "sperm", "digital"."tasting", "shape", "attachment", "resistance", "imaging", "data", "application", "index", "motility", "rights".The pattern is finding common pairings."word and its common follow-up" - too literal.Maybe focus on the *specificity* of the pairings."specific word pairings" - Getting closer. This is 3 words. It fits the length."specific terms and related concepts" - 5 words.Let's check TOP_POSITIVE_LOGITS.DeleteBehavior, blice, ьаж, mybatisplus, MainAxisSize, дописавши, mpo, ctron, Английский, atee.These look like tokens related to code, programming, possibly different languages due to `ьаж`, `дописавши`, `Английский`.This list doesn't directly help with the *noun-concept* pairing seen in the other lists. It might indicate the neuron has multiple roles or the lists are from different neurons. Assuming it's from one neuron and these are all relevant inputs, the noun-concept pairing is the most consistent *across the MAX_ACTIVATING_TOKENS and TOKENS_AFTER_MAX_ACTIVATING_TOKEN lists*.Let's stick to the most robust pattern.wine tastingcell shapecell attachmentcell motilitydrug resistanceparticle imagingtest datafunction applicationart indexsperm motilitydigital rightsGeneralizing this: A specific word followed by a conceptually related word."specific word pairings" - 3 words."domain-specific word pairings" - 4 words. specific word pairings

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Comparing With GEMMA-2-9B-IT @ 20-gemmascope-res-131k
    Configuration
    google/gemma-scope-9b-it-res/layer_20/width_131k/average_l0_81
    Prompts (Dashboard)
    24,576 prompts, 128 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    Features
    131,072
    Data Type
    float32
    Hook Name
    blocks.20.hook_resid_post
    Hook Layer
    20
    Architecture
    jumprelu
    Context Size
    1,024
    Dataset
    monology/pile-uncopyrighted
    Activation Function
    relu
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    oa̍t
    -0.42
     propOrder
    -0.40
    alakip
    -0.36
    guantes
    -0.36
    addCriterion
    -0.33
     ErrIntOverflow
    -0.32
    Ингредиенты
    -0.32
     esfuer
    -0.32
    AlterField
    -0.31
     Paglinawan
    -0.31
    POSITIVE LOGITS
    DeleteBehavior
    0.57
    blico
    0.54
    ьаж
    0.54
    mybatisplus
    0.52
     MainAxisSize
    0.52
     дописавши
    0.50
    mpo
    0.49
    ctron
    0.49
    الإنجليزية
    0.48
    atee
    0.48
    Activations Density 0.029%

    No Known Activations