© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 16-GEMMASCOPE-2-TRANSCODER-262K
    4. 99820
    Prev
    Next
    INDEX
    Explanations

    The examples show a mix of formal language, technical terms, and specific entities like names and locations.However, let's re-examine the `MAX_ACTIVATING_TOKENS` and `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` more closely as they are often the direct clues for simple patterns.- `should` -> `choose`- `Aff` -> `irm` (suggests `affirm/confirm` perhaps, or `affirmative`)- `arn` -> `ed` (suggests `earn/learned`, `earned/learned`)- `rev` -> `ised` (suggests `revise/revised`)- `with` -> `hung` (this one does not fit the suffix pattern)- `hung` -> `suspended` (this is a strong co-occurrence)- `Одна` (Russian for "one") -> `жды` (Russian for "times" or "occurrences") -> This together means "once" or "one time", suggesting frequency.- `station` -> `**` (this isn't a token, likely a formatting artifact.affirmed, revised, earned, suspended

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/transcoder_all/layer_16_width_262k_l0_small_affine
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    )
    0.47
     ridiculous
    0.44
    通道
    0.43
    asure
    0.42
     $)
    0.41
     scre
    0.41
     warmup
    0.40
    ฏิ
    0.40
     lenient
    0.39
    ure
    0.39
    POSITIVE LOGITS
    DAVID
    0.50
    </h4>
    0.49
     diversas
    0.49
    であった
    0.49
     évek
    0.48
     Estadística
    0.48
     amplio
    0.47
     diverso
    0.47
     まず
    0.47
    にお
    0.47
    Activations Density 0.000%

    No Known Activations