© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B-IT
    3. 12-GEMMASCOPE-2-RES-16K
    4. 2272
    Prev
    Next
    INDEX
    Explanations

    The neuron seems to detect events, places, or specific items that are often preceded by "the" or are related to major world events/locations.Let's apply the rules:- 3 to 20 words: Yes.- Concise explanation of what it detects/predicts: Yes.- Finds patterns: Yes.- Avoids listing tokens: Yes.- No introductory phrases like "This neuron detects": Yes.- No "tokens" or "patterns": Yes.- No capitalization unless proper noun: Yes.- Specific: Seems specific enough.- Majority match: The pattern of "the" followed by specific nouns (locations, events) is strongly suggested by MAX_ACTIVATING_TOKENS and TOKENS_AFTER_MAX_ACTIVATING_TOKEN. TOP_POSITIVE_LOGITS points towards general events/locations as well (mundial, bikes, bikes,स्थल).Let's refine based on the provided lists.MAX_ACTIVATING_TOKENS: Olympics, the, the, the, the, the, the, the, the, theTOKENS_AFTER_MAX_ACTIVATING_TOKEN: in, world, Hoover, Prado, Tulum, World, Dried, **, RoyalThe most consistent pattern is "the" or "Olympics" followed by specific places or event-related words."Hoover" (Dam), "Prado" (Museum), "Tulum" (Ruins), "Royal" (Castle/Grandeur) are all specific nouns. "world" is also present."Olympics" is a specific event.The combination of "the" and specific nouns/events strongly suggests a focus on named entities or specific occurrences. The TOP_POSITIVE_LOGITS also include "mundial" (world) and "स्थल" (place/site).A strong pattern is "the" followed by a place or event.Looking at HOOVER (Dam), PRADO (Museum), TULUM (Ruins), ROYAL (Grandeur/Castle), WORLD (Cup/event), OLYMPICS.The neuron is activating based on specific nouns that often follow "the" or relate to major events/places.Let's try to simplify to the core detection.The neuron sees "the" followed by locations/events like Hoover, Prado, Tulum, Olympics, World.Possible explanations:- "the" followed by places or events- places and events after "the"- named places and events- specific locations and world events- Specific place names or events"Olympics" is in MAX_ACTIVATING_TOKENS."the" is repeated in MAX_ACTIVATING_TOKENS."in", "world", "Hoover", "Prado", "Tulum", "World", "Dried", "Royal" are in TOKENS_AFTER_MAX_ACTIVATING_TOKEN.This hints at specific nouns following determiners or starting phrases."Hoover" (Dam), "Prado" (Museum), "Tulum" (Ruins), "Royal" (Grandeur/Castle), "Olympics".The pattern is "the" followed by specific place names or event names."the Olympics" is a common phrase."the Hoover Dam", "the Prado Museum", "the Royal Castle", "the World Cup"This neuron seems to detect noun phrases referring to specific places or events, often introduced by "the".Let's try to capture this essence concisely.1.places and events after "the"

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-it/resid_post/layer_12_width_16k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    ães
    0.88
    ITH
    0.88
    Assembl
    0.83
    Ordinate
    0.83
    cı
    0.82
    Logged
    0.82
    론
    0.82
    וכ
    0.81
     INTERNAL
    0.81
    Publication
    0.80
    POSITIVE LOGITS
     mundial
    1.22
     cupcakes
    1.12
     bicycles
    1.08
     bikes
    1.04
     calving
    1.04
     glories
    1.04
     dapat
    1.03
     brak
    1.03
     cools
    1.03
    स्थल
    1.03
    Activations Density 0.025%

    No Known Activations