© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B
    3. 24-GEMMASCOPE-2-RES-16K
    4. 12985
    Prev
    Next
    INDEX
    Explanations

    The neuron seems to be identifying the end of multi-language words or phrases, often followed by grammatical components or abstract concepts related to variety or classification. Based on the combination of `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` (like `ll`, `end`, `ve`) and `TOP_POSITIVE_LOGITS` (diverse, topics, conductas, individuals), a strong pattern emerges. The `MAX_ACTIVATING_TOKENS` being a space indicates it's looking at word boundaries. The tokens after suggest it's identifying common word endings. The top logits point to abstract concepts, often related to categories or attributes of entities.Considering the rules:- Concise explanation (3-20 words).- Find patterns.- Avoid listing tokens.- Do not start with "This neuron detects/predicts".- Do not mention "tokens" or "patterns".- Specific.Looking at the texts:- "categories (low-key, active, creative, etc.) and give a bunch of ideas within each." -> "categories", "ideas"- "range of budget levels." -> "levels"- "read the notes at the end" -> "end"- "potential income" -> "income"- "categorized by skill area" -> "categorized", "skill area"- "range of difficulty levels" -> "levels"- "designed around scenarios, Linux tools" -> "scenarios", "tools"- "organized it into modules, with increasing complexity." -> "modules", "complexity"The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` (`ll`, `end`, `ve`) are common word endings.The `TOP_POSITIVE_LOGITS` (`다양`, `等方面`, `темы`, `विविध`, `isms`, `ائنات`, `conductas`, `aquello`, `individuals`, `ждений`) are highly abstract and point towards concepts like diversity, types, aspects, topics, behaviors, entities.The `TOP_ACTIVATING_TEXTS` show examples where various *types* or *categories* are being discussed, often with associated attributes or elements (e.g., "ideas within each", "budget levels", "skill area", "difficulty levels", "scenarios", "tools", "modules", "complexity").The neuron seems to capture the concept of "items" or "entities" belonging to certain "categories" or having various "attributes" discussed in different languages.Let's try to combine these.The sequence `' '` followed by `ll`, `end`, `ve` could be the neuron detecting word endings in lists or enumerations. The `TOP_POSITIVE_LOGITS` then provides the abstract meaning.The idea of "lists" of "types" or "aspects" seems relevant.How about:- "items in lists" (3 words) - A bit too simple.- "categories and attributes" (3 words) - Good, abstract.- "diverse categories of things" (4 words) - Also good.- "various aspects and classifications" (4 words) - Captures abstractness.- "types of items and their properties" (6 words) - More descriptive.Let's re-examine `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`:`ll` (skill, well, will)`end` (end, send)`ll``ve` (have, give, live)These are very common English word endings. Combined with the space, it's likely detecting the end of typical English words.The `TOP_POSITIVE_LOGITS` then provides the 'meaning' of what those words represent in a broader context.If we look at the texts again, they are often describing lists, categories, types, or collections of things."into categories (...) and give a bunch of ideas within each.""range of budget levels.""categorized by skill area (...) range of difficulty levels""course, designed around scenarios, Linux tools""organized it into modules"The neuron seems to be capturing the idea of listing or enumerating *types* of entities or concepts, often from a diverse set.Let's consider "classifications and examples"."categories" -> classifications"ideas within each" / "levels" / "skill area" / "scenarios", "tools" / "modules" -> examples or instances of those classifications.Another angle: `TOP_POSITIVE_LOGITS` contains words like `individuals`, `conductas` (behaviors), `isms` (systems/doctrines). `TOP_ACTIVATING_TEXTS` contains `active`, `creative`, `skill area`, `tools`, `complexity`, `notes`.This neuron might be related to classifying subjects or entities and their characteristics/actions.Let's try to focus on the abstract, multilingual nature of `TOP_POSITIVE_LOGITS` combined with the specific structure suggested by `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` (identifying word endings after a space).Pattern: finds common word endings after a space, linked to abstract concepts of diversity, types, or entities.Phrase ideas:- **classifications and attributes** (3 words) - Concise and captures the essence of categories and their properties (like levels, skill areas, etc.).- **types of entities and their characteristics** (6 words) - More descriptive.- **diverse listing of concepts** (4 words) - Captures generality and variety.- **categories and associated details** (4 words) - Good, but "details" might be too narrow.- **enumerated subjects and domains** (4 words) - "Subjects" and "domains" fit individuals, conductas, areas, skills.Let's go with something that emphasizes the classification/categorization aspect with the abstract nature. "Classifications and attributes" seems strong.Let's re-check the rule: "The explanation should be specific. For example, "unique words" is not a specific enough pattern, nor is "foreign words".""Classifications and attributes" is specific in the sense that it points to these two correlated concepts.Consider: 'independently' in TOP_ACTIVATING_TEXTS. This word has 'ent' ending.'important' has 'nt' ending'active' has 've' ending -> This matches `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` `ve`.'creative' has 've' ending.'levels

    np_acts-logits-general · gemini-2.5-flash-lite

    The neuron strongly activates on short all-caps sequences (technical acronyms or abbreviations).

    oai_token-act-pair · o4-miniTriggered by @jyhe0408
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-pt/resid_post/layer_24_width_16k_l0_medium
    Prompts (Dashboard)
    392,802 prompts, 256 tokens each
    Dataset (Dashboard)
    monology/pile-uncopyrighted
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     two
    0.71
     wenn
    0.70
     delantero
    0.70
     sebelah
    0.69
     trzec
    0.64
     seiner
    0.64
     zwei
    0.63
     striker
    0.62
     två
    0.62
     maxillary
    0.61
    POSITIVE LOGITS
     다양
    0.93
     विविध
    0.79
    themes
    0.79
    ต่างๆ
    0.79
    各種
    0.78
     разнообраз
    0.77
    各类
    0.77
    ائنات
    0.76
    environments
    0.73
    trivia
    0.71
    Activations Density 0.121%

    No Known Activations