© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 38-GEMMASCOPE-2-TRANSCODER-262K
    4. 352
    Prev
    Next
    INDEX
    Explanations

    The pattern is related to numbers associated with specific resources, and the top positive logits are words for "women".Looking at `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`:- `MAX_ACTIVATING_TOKENS`: `5`- `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`: `3`, `6`These numbers appear in the `TOP_ACTIVATING_TEXTS` in contexts like:- "...1-800-656-HOPE" (HOTLINE number)- "...1-800-799-SAFE" (DOMESTIC VIOLENCE HOTLINE number)- "3-0" (Score in a game)The `TOP_POSITIVE_LOGITS` are all words for "women" in different languages.This neuron seems to activate when sequences of numbers appear, especially in the context of support hotlines or scores, and it's associated with the concept of "women".Let's re-evaluate. The `MAX_ACTIVATING_TOKENS` is literally just the number `5`. The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` are `3` and `6`. These numbers often appear together in lists, especially lists of phone numbers.**1-800-656-HOPE**

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/transcoder_all/layer_38_width_262k_l0_small_affine
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    Setpoint
    0.44
     recesses
    0.44
    ABS
    0.41
    search
    0.40
    arr
    0.38
     BMI
    0.38
     Phil
    0.38
     wisdom
    0.38
     abs
    0.38
     gap
    0.37
    POSITIVE LOGITS
     ಮಹಿ
    0.60
     женщины
    0.59
     महिलाओं
    0.59
     महिलाएं
    0.57
     vrouwen
    0.55
     женщин
    0.54
     عورت
    0.54
     womens
    0.54
     womans
    0.54
     महिला
    0.53
    Activations Density 0.009%

    No Known Activations