© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 12-GEMMASCOPE-2-TRANSCODER-262K
    4. 201292
    Prev
    Next
    INDEX
    Explanations

    The user wants me to explain the neuron's behavior.The explanation must be 3-20 words, concise, and find a pattern.I should avoid phrases like "This neuron detects/predicts", "tokens", or "patterns".The first letter should not be capitalized unless it's a proper noun.Let's examine the provided lists:1. **MAX_ACTIVATING_TOKENS**: The tokens here are "включая", "includes", "including". These clearly point to the idea of inclusion or listing items.2. **TOKENS_AFTER_MAX_ACTIVATING_TOKEN**: The tokens following "including" or "включая" are varied: "продви", "сы", "a", ":", "Windows", "Linux", "Windows", "географи", "Shopify", "disease". This list shows diverse items that can follow the concept of "including". It suggests that the neuron is often triggered when introducing a list of items or features.3. **TOP_POSITIVE_LOGITS**: Words like "legumes", "tomatoes", "blueberries", "appetizers", "included", "شامل" (Arabic for "including/comprehensive") strongly reinforce the theme of listing items, categories, or comprehensive items.4. **TOP_ACTIVATING_TEXTS**:including or listing items

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/transcoder_all/layer_12_width_262k_l0_small_affine
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    on
    0.77
    f
    0.68
    j
    0.64
    is
    0.61
    an
    0.59
    k
    0.59
    v
    0.57
    h
    0.52
    是
    0.51
    fing
    0.50
    POSITIVE LOGITS
     legumes
    0.68
     tomatoes
    0.63
     knots
    0.60
     blueberries
    0.59
     representatives
    0.59
     شامل
    0.59
     appetizers
    0.58
     separators
    0.56
     coups
    0.56
     included
    0.56
    Activations Density 0.003%

    No Known Activations