© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 31-GEMMASCOPE-2-RES-262K
    4. 6001
    Prev
    Next
    INDEX
    Explanations

    I understand limits

    np_acts-logits-general · gemini-2.5-flash-lite

    sentences where the model asserts safety constraints and refuses or declines disallowed/explicit requests (e.g., "I am programmed to be a safe and helpful AI assistant" / refusal language).

    oai_token-act-pair · gpt-5-miniTriggered by @vetterc0
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/resid_post/layer_31_width_262k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     critters
    0.51
     gef
    0.46
     scallops
    0.44
     labs
    0.43
     gets
    0.43
     ज्यादातर
    0.43
     mayhem
    0.41
    篩
    0.41
     judgements
    0.40
     judgments
    0.40
    POSITIVE LOGITS
    Instead
    0.57
    我可以
    0.54
     попыта
    0.49
    Redirect
    0.48
    拒绝
    0.48
    redirect
    0.47
    incerely
    0.47
     Redirect
    0.47
     Instead
    0.46
     सकता
    0.46
    Activations Density 0.702%

    No Known Activations