© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B
    3. 24-GEMMASCOPE-2-RES-16K
    4. 11214
    Prev
    Next
    INDEX
    Explanations

    The pattern seems to revolve around specific tokens like "Pro", "exit", "produced", "Where", and "year", followed by punctuation or common words. The positive logits are mostly non-Latin characters.Looking for shared meaning or context:- "Pro" (often followed by a comma or 'Max' which isn't in after-tokens)- "exit" (followed by "no-sorry mode", "to this un-exitable context")- "produced gunpowder" (specific historical context)- "fruit retailer" (business context)- "put things" (reminders)- "Where" (locations, followed by "." or "the")- "year" (followed by commas or dates)The positive logits signal the neuron might be sensitive to different character sets or languages, or perhaps certain symbolic representations.Let's re-examine MAX_ACTIVATING_TOKENS:`-`, `Pro`, `exit`, `produced`, `a`, `put`, `Where`, `.`, `where`, `year`And TOKENS_AFTER_MAX_ACTIVATING_TOKEN:`[`, `)`, `.`, `gunpowder`, `fruit`, `things`, `the`, `If`, `you`, `,`Some combinations:- `Pro` followed by `.` or `[` (from top texts)- `exit` followed by `.` or `,`- `produced` followed by `gunpowder`- `a` followed by `.` or `fruit` or `things` or `the` or `If` or `you` or `,` (very common)- `put` followed by `things`- `Where` followed by `.` or `the`- `.` followed by `gunpowder` or `fruit` or `things` or `the` or `If` or `you` or `,`- `where` followed by `the`- `year` followed by `,`The presence of punctuation and common words after many of these tokens suggests a structural role or a transition.The non-Latin logits are a strong hint that the neuron is picking up on something beyond English.Let's consider the *meaning* or *function*:- "Pro" could relate to professionalism, pros/cons, or product names.- "exit" is about leaving or ending.- "produced" is about creation/generation.- "Where" is about location.- "year" is about time.- The non-Latin characters might indicate internationalization, code, or specific symbols.The pattern seems to be detecting specific keywords, often followed by punctuation or common article/prepositional phrases, and potentially handling multiple languages or character sets.Trying to simplify:"keywords followed by punctuation" - too generic."multilingual keywords" - captures the logits but not fully the tokens."discrete tokens followed by common separators" - too technical.Let's look at `TOP_POSITIVE_LOGITS` again: `м`, `im`, `م`, `Б`, `Х`, `enkelt`, `U`, `ara`, `مم`, `М``enkelt` is Norwegian/Swedish for "simple".`ara` appears in many languages.If we focus on the English tokens:`Pro`, `exit`, `produced`, `a`, `put`, `Where`, `where`, `year`.These are quite common words.The combination of common English words mixed with non-Latin scripts and often followed by punctuation or common grammatical words points towards a parser's helper or a component that breaks down text structurally and potentially across languages.What if the neuron detects 'start/end' markers or specific grammatical structures?"exit" can mark an end. "Where" can mark a query start. "produced" marks an event. "Pro" can start a list item.The `.`, `[`, `)`, `,` after tokens are very common structural elements.Let's consider the possibility of language detection or specific formatting.keywords and common punctuation

    np_acts-logits-general · gemini-2.5-flash-lite

    The neuron is highly activated at paragraph or section boundaries—that is, it detects the start of a new block of text.

    oai_token-act-pair · o4-miniTriggered by @jyhe0408
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium
    Prompts (Dashboard)
    392,802 prompts, 256 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     unsound
    0.87
    を引き
    0.81
    ется
    0.80
    ப்பாக
    0.79
    放心
    0.77
     Rhino
    0.75
     गाड़ियों
    0.75
    টিভ
    0.73
     Kitten
    0.72
     Pies
    0.71
    POSITIVE LOGITS
    м
    0.98
    im
    0.77
    Б
    0.77
    م
    0.73
     enkelt
    0.72
     doua
    0.70
     žmog
    0.70
    مم
    0.68
     НА
    0.68
    uas
    0.68
    Activations Density 0.000%

    No Known Activations