Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APICircuit TracerNEWSteerSAE EvalsExportsSlackBlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    EXPLANATION TYPE
    np_max-act-logits
    Description
    A Neuronpedia original that attempts to replicate Anthropic's autointerp used for their attribution graphs paper's features.
    Author
    Neuronpedia
    URL
    https://github.com/hijohnnylin/automated-interpretability/blob/4463a9fab7d4828bfd4c33194e64856b95377166/neuron_explainer/explanations/explainer.py#L811-L1135
    Settings
    Activations shown = 24 tokens around max act. Shows top 10 logits. Shows model the max activating token too.
    Recent Explanations
    confessions
    claude-3-7-sonnet-20250219
    65 Ark. L. Rev. 799 (
    Neuronpedia logo
    DEEPSEEK-R1-DISTILL-LLAMA-8B
    15-LLAMASCOPE-SLIMPJ-RES-32K
    INDEX 8093
    assembly attributes
    o4-mini
    ("")]↵[assembly: AssemblyCulture("")]
    Neuronpedia logo
    GEMMA-2-2B
    1-CLT-HP
    INDEX 12928
    surnames
    claude-3-7-sonnet-20250219
    rol H. Stambler, Los Angeles, CA
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 40525
    each
    gpt-4o
    . It is up to each of us to define our
    Neuronpedia logo
    LLAMA3.1-8B-IT
    27-RESID-POST-AA
    INDEX 9263
    mild
    gpt-4o-mini
    X-ray image showing mild RA, diagnosed in 
    Neuronpedia logo
    GEMMA-2-2B
    0-CLT-HP
    INDEX 10
    or
    o4-mini
    Traffic Prioritization," or "Bandwidth Management."
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 81285
    Russian names, culture
    gemini-2.0-flash
    before the match "Yuri, have a heart and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 16206
    Russian names
    gemini-2.5-flash-lite
    before the match "Yuri, have a heart and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 16206
    Russian names
    gemini-2.5-flash
    before the match "Yuri, have a heart and
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 16206
    gaslighting abuse
    gemini-2.5-flash-lite
    their own feelings, instincts, and sanity, which gives
    Neuronpedia logo
    LLAMA3.1-8B-IT
    11-RESID-POST-AA
    INDEX 16205
    say programming syntax
    claude-3-5-haiku-20241022
    KIND, either express or implied.↵# See
    Neuronpedia logo
    QWEN3-4B
    27-TRANSCODER-HP
    INDEX 1198
    brands and products
    gemini-2.0-flash
    e.g., washing machines, dishwashers, refrigerators
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 130989
    Turkish language
    gemini-2.0-flash
    if decret) yolu, istisnai ve sınırl
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131008
    breeding
    gemini-2.0-flash
    ding. | Created the breeding ground for the first **
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131037
    JSON serialization/deserialization
    gemini-2.0-flash
    Text(path);↵ return JsonSerializer.Deserialize<Quiz>(
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 130972
    Wishing good luck in Chinese
    gemini-2.0-flash
    。祝你编码顺利!<|return|>
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131061
    ans
    gemini-2.0-flash
    pigeons that carried the ansamblu of the city
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 130960
    foreign language
    gemini-2.0-flash
    ija** – VID ir atbildīga par iem
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131051
    Code and math expressions
    gemini-2.0-flash
    LIKE CONCAT('%', LOWER(?), '%') ");↵
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131038
    Iraq
    gemini-2.0-flash
    techniques, ancient city founding (Nineveh, Babylon,
    Neuronpedia logo
    GPT-OSS-20B
    23-RESID-POST-AA
    INDEX 131069