© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 31-GEMMASCOPE-2-RES-262K
    4. 10266
    Prev
    Next
    INDEX
    Explanations

    definitions and classifications (IS-A)2. specific entities or names (SigmaLambda)3. requests or commands (Write)4. descriptions of aesthetic content (aesthetic photos)5. ongoing change or development (evolving)6. software or code (program)7. intense sentiment (massively)The common thread is these are all specific concepts or entities often found in structured data, definitions, or instructions.The most prominent and common theme across MAX_ACTIVATING_TOKENS and TOP_ACTIVATING_TEXTS are terms like "aesthetic", "IS-A", "SigmaLambda", "Write", "evolving", "program", and "massively".A good candidate phrase would describe these types of specific, often technical or descriptive phrases."definitions, descriptions, instructions, and names" - too long.Let's try to find a more abstract concept that covers these.These are often about classifying things, describing states, or referring to specific entities.Consider the words: IS, Sigma, Write, evolving, program, aesthetic, massively.And text: IS-A, SigmaLambda, Write, evolving, program, aesthetic photos, massively depressed.The neuron seems to be identifying specific definitions, entities, instructions, or descriptors.Perhaps it's related to structured information or technical descriptions.What if we look at the *purpose* of these words/phrases?- IS-A: Classification.- SigmaLambda: Specific entity, technical name.- Write: Instruction.- evolving: Process, development.- program: Software, system.- aesthetic: Description, style.- massively: Intensity modifier.The variety is quite large. Let's re-evaluate the prompt's examples of good answers. They are usually very concise and pick out a core element."aesthetic photos" - `aesthetic` is in MAX. `photos` is in TOKENS_AFTER."IS-A" - `IS` is in MAX. `IS-A` is in TEXT."SigmaLambda" - `Sigma` is in MAX. `Lambda` is in TOKENS_AFTER."Write a witty..." - `Write` is in MAX."evolving with the rise of computers" - `evolving` is in MAX."rudimentary program" - `program` is in MAX."massively depressed" - `massively` is in TOKENS_AFTER.It seems to be picking up on specific identifiers, descriptive terms, and instructions.The presence of `IS-A`, `SigmaLambda`, `Write`, `program`, `aesthetic`, `evolving`, `massively` suggests it's interested in concrete, specific information types rather than abstract concepts.Let's simplify.`aesthetic``IS-A``SigmaLambda``Write``evolving``program``massively`These are all specific concepts or entities.The TOP_POSITIVE_LOGITS might be a red herring if they are non-English.Let's try to combine `aesthetic` and `program` with the idea of descriptions and instructions.Perhaps `descriptive terms and instructions`. Still a bit long.Consider the `IS-A` and `SigmaLambda`. These suggest classification or specific entities.`Write` suggests direct command.`aesthetic` suggests a descriptive style.`evolving` suggests a process.`program` suggests technical content.`massively` suggests intensity.Maybe the neuron is about *categorizing* or *labeling* specific types of content or requests.Let's look at the words in MAX_ACTIVATING_TOKENS again: aesthetic, IS, ", Sigma, Write, evolving, is, program.And associated text examples: aesthetic photos, IS-A, SigmaLambda, Write, evolving, program.These are all pointing to specific conceptual categories or entities."specific concepts and entities" is 4 words."specific entities, classifications, and instructions" is 5 words.Let's refine: the neuron is activating for specific types of words/phrases that contribute to defining something, commanding something, or describing something distinct.Could it be about "specific types of definitions or requests"? 5 words.Let's focus on the MOST specific pattern.`IS-A`: Classification.`SigmaLambda`: Named entity/model.`Write`: Command.`aesthetic`: Descriptive adjective.`evolving`: Process.`program`: Noun for a system.These are all concrete pieces of information.The neuron seems to be picking out specific keywords that denote concrete subjects, actions, or states.Consider what is NOT happening: it's not about emotion generally, or simple grammar. It's about *specific* content.How about "specific descriptions, classifications, and commands"? 6 words.This seems to capture the essence well. `aesthetic` (description), `IS-A` (classification), `Write` (command). `SigmaLambda` and `program` fit under specific entities/subjects. `evolving` fits under process/description. `massively` fits under descriptive intensity.Let's re-read the rules: "concise explanation (3 to 20 words)", "single word, or phrase, or pattern", "specific"."specific descriptions, classifications, and commands" fits.Let's check the TOP_POSITIVE_LOGITS again for any clues.panneau, اي, پول, ﻜ, 🧈, ත්,ход, deto, ﺆ, האיThese are mainly non-Latin characters or language-specific tokens.'panneau' means 'panel' in French.'اي', 'پول', 'ﻜ', 'ﺆ', 'האי' look like Arabic

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/resid_post/layer_31_width_262k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     Guan
    0.43
     guan
    0.42
     ガ
    0.40
    ुन
    0.40
     raids
    0.40
     mutter
    0.38
     rig
    0.38
     solen
    0.38
     wre
    0.38
    otip
    0.38
    POSITIVE LOGITS
     panneau
    0.49
    اي
    0.48
    پول
    0.46
    ﻜ
    0.44
    🧈
    0.43
    ත්
    0.42
    ход
    0.42
    дето
    0.42
    އ
    0.42
     האי
    0.41
    Activations Density 0.000%

    No Known Activations