Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExportsSlackBlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    np_max-act
    Description
    A Neuronpedia original that forces concise explanations and shows the model the top activating tokens and texts. A simpler version of np_max-act-logits.
    Author
    Neuronpedia
    URL
    https://github.com/hijohnnylin/automated-interpretability/blob/917b11e38111c43526fe03ae6094a7081aeb982a/neuron_explainer/explanations/explainer.py#L1181
    Settings
    Activations shown = 24 tokens around max act. Shows model the max activating token too.
    Recent Explanations
    Bro
    gemini-2.5-flash
    <|begin_of_text|>Brooks Kubik is the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 93282
    opinions, preferences, evaluations
    gemini-2.5-flash
    bank or a rip-off, IMO.↵↵There's no
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 108911
    many
    gemini-2.5-flash
    ?\n\nBecause it had too many problems.\n\nI hope you
    Neuronpedia logo
    LLAMA3.3-70B-IT
    50-RESID-POST-GF
    INDEX 44911
    exclamation mark
    gemini-2.5-flash
    was outstanding in his field!<|eot_id|><|start_header_id|>user<|end_header_id|>↵↵
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 73126
    office
    gemini-2.5-flash
    decides. It’s your office and you should be the
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 84480
    future
    gemini-2.5-flash
    <|begin_of_text|>She moaned on his
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 40600
    their
    gemini-2.5-flash
    <|begin_of_text|>She moaned on his
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 102078
    cute
    gemini-2.5-flash
    <|begin_of_text|>Cute girl Daisy desires to become
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 9825
    cute
    gemini-2.5-flash
    best-selling literary genre where cute love. We have always
    Neuronpedia logo
    LLAMA3.1-8B-IT
    23-RESID-POST-AA
    INDEX 103940
    cute
    gemini-2.5-flash
    is one thing chocolate, cute boys and flowers just don
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 10092
    attention
    gemini-2.5-flash
    by NAME_1's attention, but he was also
    Neuronpedia logo
    LLAMA3.1-8B-IT
    7-RESID-POST-AA
    INDEX 128840
    you
    gemini-2.5-flash
    .↵↵---↵↵Do you want me to:↵↵
    Neuronpedia logo
    GEMMA-3-4B-IT
    29-GEMMASCOPE-2-RES-65K
    INDEX 1242
    common words or tokens
    gemini-2.5-flash
    pattern may include the sub-steps of: comparing the
    Neuronpedia logo
    GEMMA-3-27B
    16-GEMMASCOPE-2-RES-16K
    INDEX 1
    first-person statements
    gemini-2.5-flash
    very good gardeners. I’m sure this tree is
    Neuronpedia logo
    LLAMA3.1-8B-IT
    19-RESID-POST-AA
    INDEX 83504
    opposite/anti responses
    gemini-2.5-flash
    "Do you understand?"<|eot_id|><|start_header_id|>assistant<|end_header_id|>↵↵No!
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 33274
    0
    gpt-5
     all xe2x80x9cmandatoryxe
    Neuronpedia logo
    GEMMA-2-9B
    21-GEMMASCOPE-RES-16K
    INDEX 14393
    so
    claude-4-5-sonnet
     combine all the elements presented so far an show how,
    Neuronpedia logo
    GEMMA-2-2B
    20-GEMMASCOPE-RES-16K
    INDEX 1
    self
    gpt-5
    or specialty addictions treatment, self-monitoring)\↵*
    Neuronpedia logo
    GEMMA-2-2B
    13-GEMMASCOPE-RES-16K
    INDEX 7367
    Copyright
    deepseek-r1
    =============================================↵    Copyright (c) 2
    Neuronpedia logo
    GEMMA-2-2B
    13-GEMMASCOPE-RES-16K
    INDEX 11930
    or
    o4-mini
    Traffic Prioritization," or "Bandwidth Management."
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 81285