© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Llama3.3-70B-IT
    3. 50-RESID-POST-GF
    4. 35157
    Prev
    Next
    INDEX
    Explanations

    end of structured text"}, {"explanation": "parentheses and punctuation", "model_score": 0.90}, {"explanation": "closing punctuation", "model_score": 0.90}, {"explanation": "closing parenthese, comma or colon", "model_score": 0.88}, {"explanation": "end of phrase punctuation", "model_score": 0.88}]The MAX_ACTIVATING_TOKENS are all forms of close punctuation like `)`, `:`, and `)`. They often appear at the end of a phrase or code structure.The TOKENS_AFTER_MAX_ACTIVATING_TOKEN include things that look like labels or parts of text: `Description`, `FS`, `Py`, `NAME`, `dialogue`, `Anything`.The TOP_ACTIVATING_TEXTS show examples where these closing punctuation marks appear before these labels or descriptions, or are part of parenthetical remarks like `(18+)` or `(MM-Teich)`.`TOP_POSITIVE_LOGITS` are a mix, `ogy`, `jpg`, `png` suggest file types, `BeginInit` suggests code, `and` is a conjunction, `"\n` is a newline. This list is less coherent with the punctuation pattern.However, the dominant pattern seems to be related to the punctuation marks themselves, and what follows them in the context. The punctuation often signifies the end of a parenthetical statement, a code block, or a list item.Let's try to capture this:- The punctuation marks themselves.- The fact they are *closing* punctuation.- The context where they appear (end of structures).Possible phrases:- closing punctuation followed by labels- end of parenthetical or code- punctuation ending phrases or code- closing parentheses and punctuationThe prompt asks for what the neuron *detects or predicts by finding patterns in lists*. It is about features related to text.The core pattern is the presence of closing punctuation. The subsequent tokens give context.Let's focus on the most concrete aspect: the punctuation itself, and its role.The MAX_ACTIVATING_TOKENS are clearly closing punctuation.The TOKENS_AFTER_MAX_ACTIVATING_TOKEN are often followed by descriptive words or identifiers.The TOP_ACTIVATING_TEXTS show these punctuation marks ending parenthetical comments or code.A phrase that captures this:"closing punctuation marks" - too short and not specific enough."punctuation ending parenthetical phrases" - better, but maybe "parenthetical" is too specific."ending punctuation followed by description" - captures the relation."end of phrase punctuation" - good general description."closing punctuation" - simple, direct.Given the examples like `)):?>`, `(18+)`, `(MM-Teich)`, `1H,5H)-Pyrimidinedione`, it's about closing off a segment, often a parenthetical one, or ending a line of code/data.Consider "closing punctuation before labels" or "end of structured segments".The available tokens after are `Description`, `FS`, `Py`, `NAME`, `dialogue`, `Anything`. These look like labels or identifiers.Let's go with something that highlights the punctuation's function as an endpoint."end of structured text" - covers code and descriptions."closing punctuation" - too simple."parentheses and closing punctuation" - good."punctuation ending segments" - also good. end of structured text

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Comparing With LLAMA3.3-70B-IT @ 50-resid-post-gf
    Configuration
    Goodfire/Llama-3.3-70B-Instruct-SAE-l50/Llama-3.3-70B-Instruct-SAE-l50.pt
    Prompts (Dashboard)
    10,000 prompts, 128 tokens each
    Dataset (Dashboard)
    lmsys/lmsys-chat-1m
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    appa
    -0.10
    bler
    -0.09
    ï¼Ī
    -0.09
    %c
    -0.08
    ase
    -0.08
    aver
    -0.08
    ella
    -0.08
    ãĢģ
    -0.08
    cpy
    -0.08
    gh
    -0.08
    POSITIVE LOGITS
    ogy
    0.11
    jpg
    0.11
    adoo
    0.09
    BeginInit
    0.09
    IGINAL
    0.09
    png
    0.09
    CHKERRQ
    0.09
    and
    0.09
    "\n
    0.09
    âĦĸâĦĸ
    0.09
    Activations Density 0.208%

    No Known Activations