© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B-IT
    3. 12-GEMMASCOPE-2-RES-16K
    4. 4167
    Prev
    Next
    INDEX
    Explanations

    This prompt asks me to repeat the explanation for the neuron. I need to re-evaluate based on the provided lists and follow the rules.Let's re-examine the lists for patterns.**MAX_ACTIVATING_TOKENS**:The single most dominant token here is `and`. It appears multiple times.**TOKENS_AFTER_MAX_ACTIVATING_TOKEN**:The tokens following `and` include `you`, `si`, `the`, `instead`, `cannot`, `find`, `cannot`, `need`, `you`, `cannot`.Looking at these, `cannot` is repeated three times. `you` appears twice.**TOP_POSITIVE_LOGITS**:- `जनरल` (Hindi for general)- `forcibly`- `whatnot`- `forcefully`- `ೋತಿ` (Looks like a Tamil/Sanskrit suffix or part of a word)- `visualise`- `capitalize`- `pioneered`- `an`- `розгля` (Ukrainian for consider)**TOP_ACTIVATING_TEXTS**:- "If you create a form in google forms and you set that the email adresses of those who fill in the form will remain annonimous, can you change" - Mentions `and` used with `you`.- "Que se necesita para crear una I.A. y si puedes darme ejemplos de algunas I.A. como sus funciones o Características."**predicting "cannot" after "and"**

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-it/resid_post/layer_12_width_16k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    nın
    1.09
     więc
    1.07
    です
    0.99
    있
    0.97
     Highness
    0.96
    同じ
    0.95
    때
    0.93
    ش
    0.93
    ல்
    0.91
    சம்
    0.90
    POSITIVE LOGITS
    েনারেল
    0.93
     forcibly
    0.83
     whatnot
    0.83
     forcefully
    0.79
    ೋತಿ
    0.79
     visualise
    0.75
     capitalize
    0.75
     pioneered
    0.74
    an
    0.74
     розгля
    0.74
    Activations Density 0.087%

    No Known Activations