© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-4B-IT
    3. 12-GEMMASCOPE-2-TRANSCODER-16K
    4. 11
    Prev
    Next
    INDEX
    Explanations

    language indicating progression, advancement, improvement, or dominance of a subject over time or in comparison to alternatives.

    oai_token-act-pair · claude-4-5-haikuTriggered by @ruq2026

    The marked tokens across these examples are predominantly adverbial or verbal elements that indicate continuation, progression, or intensification of states and actions. These include words like "continues," "gaining momentum," "improving," "increasingly," "steadily," "growing," and similar terms that suggest ongoing processes or escalating trends. The pattern reflects language used to describe dynamic change, development, or persistence over time in various contexts—whether discussing technological advancement, historical progression, market trends, or skill development.

    eleuther_acts_top20 · claude-4-5-haikuTriggered by @ruq2026
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-4b-it/transcoder_all/layer_12_width_16k_l0_small_affine
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     Czechoslovakia
    0.15
     Connecticut
    0.14
     Chrysler
    0.14
     Charlotte
    0.13
     refreshments
    0.13
     époque
    0.13
     Cleveland
    0.13
     anod
    0.13
     memorial
    0.13
     Ark
    0.13
    POSITIVE LOGITS
     tekan
    0.13
     tighten
    0.13
     መን
    0.13
     पुरे
    0.13
    ウ
    0.13
    clen
    0.12
    rench
    0.12
    激
    0.12
    吘
    0.12
     जोरदार
    0.12
    Activations Density 0.960%

    No Known Activations