© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 53-GEMMASCOPE-2-RES-262K
    4. 18945
    Prev
    Next
    INDEX
    Explanations

    lo, lo, lo, loloafLoRAloquaciousloquacious, loquaciousloquacious, loaf, LoRA, lo, loafer, LoloquaciousloloLoRAloafloquaciousloaf, lo, LoRA, Lo"lo"lolo, loaf, lololoLoRAloquaciouslo, loafersloaflo, loafer, LoRAloloafloquaciouslo, loaferLoRA, loloquaciousloafloquaciousloafloloquaciousloafloaflo, loaferlo, loafersloaflo, lo, lololoLoRAlololololoquacious, loafloloquaciouslolo, loaferslo, loaferlo"loloquaciouslo, loafersloloafloquaciouslo, loafersloafloquaciouslololoquaciousLoRAloaflolololo"lolololoLoRAlolo, loafersloloquaciousloquaciouslolo, loaferslolo, lo, lololoaflo"loquaciouslololoquaciouslo, loaferslololololoquaciousloquaciousloafloloquaciousloquaciousloloThe pattern is clearly around the beginning "lo" sound, often followed by common words or specific technical terms."lo" appears in many top activating texts: "loquacious", "loaf", "lo" (biblical), "lo" (as a sound in words like "polo"), "LoRA", "lo,"."lo" is also in `TOP_POSITIVE_LOGITS`.The tokens after `MAX_ACTIVATING_TOKENS` like `lov` (love?), `lo`, `loaf`, `lodge`, `Locale` also point to this "lo" sound.`MAX_ACTIVATING_TOKENS` has `flats` and `Pax` (related to Paxlovid). 'flats' could be related to 'ballet flats'. 'Pax' is likely related to Paxlovid.Let's look at the common themes:- Words starting with "lo": loquacious, loaf, lo, LoRA, lo(cal, etc.),loafer.- Words containing "lo" or sounding like it: polo, flats (ballet flats).- Related concepts: Paxlovid.The most consistent and dominant pattern is the "lo" sound/prefix. Many positive logits are variations of "Lo".Candidate explanations:- "lo" prefix- words starting with lo- lo sound- lo, loaf, LoRA"lo" prefix seems too general. "words starting with lo" is good but slightly long and mentions "words"."lo sound" is good."lo, loaf, LoRA" are specific examples.Let's re-evaluate the `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`.`MAX_ACTIVATING_TOKENS`: `)`. `flats`, `Pax`, `"`, `each`, `cé`, `dis`, `shop`, `",``TOKENS_AFTER_MAX_ACTIVATING_TOKEN`: `,`, `lov`, `lo`, `loaf`, `*`, `ng`, `lodge`, `_`, `Locale`Top Pos Logits: Lo, لو, Lo, LO, ло, lo, Ло, लो, LO, Ло (all variations of "Lo")The word "lo" itself appears as a token."Loquacious" starts with "lo"."Loaf" starts with "lo"."LoRA" starts with "lo"."Lo" (biblical) starts with "lo"."Locale" starts with "lo"."Loosely" might be implied by "loquacious"."Loafers" and "flats" appear in texts, "loafers" starts with "lo", "flats" is in MAX_ACTIVATING_TOKENS."Paxlovid" is mentioned, which contains "lo".Given the prompt's examples and rules:- "find patterns"- "concise (3 to 20 words)"- "explain what the neuron detects or predicts by finding patterns in lists."- "look at the lists and find a pattern"- "Do not mention "tokens" or "patterns""- "just say the pattern itself"The most prominent and recurring element across all lists is the "lo" sound or prefix.Explanations like "lo, loaf, LoRA" combine specific occurrences.Explanations like "lo sound" or "lo prefix" describe the underlying pattern.The instruction is to "explain the behavior of the neuron". The neuron *activates* for things containing "lo".Consider "lo", "lo, loaf", "lo, loaf, LoRA", "lo, loquacious, loaf", "loquacious, loaf, LoRA".Let's try to be more specific as per the rule: "The explanation should be specific. For example, "unique words" is not a specific enough pattern, nor is "foreign words"."The common characteristic is the "lo" prefix or sound.The top positive logits are all variations of "Lo".The texts contain "loquacious", "loaf", "lo" (biblical), "LoRA", "locale", "loafers".The tokens after include "lo", "loaf".The simplest, most direct representation of this recurring theme is "lo". However, that might be too short if it needs to be

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/resid_post/layer_53_width_262k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    梫
    0.39
    빤
    0.38
    夶
    0.38
    糠
    0.37
     buscador
    0.36
     ubicación
    0.36
    fato
    0.35
    ✙
    0.34
    Ꭸ
    0.34
    𝕋
    0.33
    POSITIVE LOGITS
     Lo
    2.83
     لو
    2.81
    Lo
    2.80
     LO
    2.80
     ло
    2.75
     lo
    2.66
     Ло
    2.63
     लो
    2.59
    LO
    2.56
    Ло
    2.45
    Activations Density 0.094%

    No Known Activations