© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B
    3. 24-GEMMASCOPE-2-RES-16K
    4. 4564
    Prev
    Next
    INDEX
    Explanations

    The neuron is detecting questions and instructions, including instructions in Spanish and Russian like "¿Para quién producir?" and "Для достижения". It also seems to be related to specific formatting elements like code blocks (`[` and `:`) and the start of text (`opening`).Considering the rules:- Concise (3-20 words)- Find patterns- Avoid "tokens", "patterns", "detects/predicts"- Do not capitalize first letter unless proper nounLet's look at the most specific patterns:* **`quién`** followed by **`producir`**: "who to produce" (Spanish question fragment)* **`the`** followed by **`opening`**: related to text structure, introductions.* **`[`** followed by **`достижения`**: seems like code block indicators or formatting before Russian "achievements".* **`a`** followed by **`company`** / **`foreign`**: common English phrases.* **`TOP_POSITIVE_LOGITS`**: `ken`, `ﻜ`, `istiche`, `ustible`, `umine`, `h`, `븅`, `smanship`, `sympathy`, `ForResult`. This looks like a mix of English suffixes, some less common characters, and potentially foreign language elements. `ForResult` could be Hindi/Sanskrit for "result." `istiche` has a German/Romance language feel. `manship` is an English suffix. `sympathy` is English. `umine` could be part of a word. `h` is very generic. `ken` is English.The most compelling pattern is the mix of languages and the nature of the words.Spanish: `quién`, `producir`Russian: `Для`, `достижения`English: `the`, `opening`, `a`, `company`, `foreign`, `ken`, `manship`, `sympathy`Possibly other languages suggested by `istiche`, `umine`, `ForResult`, `븅`.The `MAX_ACTIVATING_TOKENS` also has numbers (`6`, `0`) and punctuation (`:` `[`).The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` includes sentence endings (`.`), verbs (`producir`), and common nouns/adjectives (`opening`, `company`, `foreign`).The neuron seems to activate on segments that involve questions, instructions, technical formatting (like code brackets/colons), and a mix of international languages, often related to action, purpose, or results.Let's try to combine these."questions, instructions, formatting, multilingual" - This is a bit too descriptive and lists categories.Looking closer at Spanish: "¿Para quién producir?" means "For whom to produce?". This fits with "Who, What, When, Where, Why" mentioned in the text.The Russian "Для достижения" means "for achievement/to achieve".The phrase "opening paragraph" in `TOP_ACTIVATING_TEXTS` is also significant. "opening" follows "the".Maybe it's about initiating or defining something, across languages."initiating phrases and multilingual definitions" - 4 words."multilingual questions instructions and definitions" - 5 words."defining multilingual instructions and questions" - 5 words."international structure and purpose queries" - 5 words.Let's revisit the top positive logits for hints.`ken`, `istiche`, `ustible`, `umine`, `manship`, `sympathy`, `ForResult`.These don't scream a central theme that's easily combined with the tokens. `sympathy`, `manship` are English nouns. `istiche` could be a suffix from German/Slavic/Romance languages. `ForResult` is likelyanskrit/hindi.multilingual definitions, questions, and instructions

    np_acts-logits-general · gemini-2.5-flash-lite

    The neuron responds to floating‐point numeric tokens (numbers with decimal points).

    oai_token-act-pair · o4-miniTriggered by @jyhe0408
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium
    Prompts (Dashboard)
    392,802 prompts, 256 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     éléments
    0.93
     étapes
    0.89
     tonnes
    0.83
    actéristiques
    0.82
     Itália
    0.82
     excessively
    0.80
     pezzi
    0.79
     समस्याओं
    0.79
     महिलाओं
    0.78
     étape
    0.77
    POSITIVE LOGITS
    ken
    0.78
     sympathy
    0.72
    istische
    0.69
    h
    0.69
    ForResult
    0.68
    elijkheid
    0.68
    ustible
    0.68
    umine
    0.68
    .\\
    0.67
    Consent
    0.67
    Activations Density 0.000%

    No Known Activations