The Residual Stream

    Neuronpedia's Blog

    Assistant Axis, Gemma Scope 2, SAELens, New Circuit Tracing, and NNsight

    Assistant Axis, Gemma Scope 2, SAELens, New Circuit Tracing, and NNsight

    A Public Ask, and Collaborations with Anthropic, Google DeepMind, NDIF
    By David Chanin, Michael Hanna, Jaden Fiotto-Kaufman, Clément Dumas, Johnny Lin · January 28th, 2026

    Happy new year! Two big collaborations in this one.

    First, a public ask: We urgently need help resolving an issue with Google Cloud that's blocking a key part of our work. Standard support channels haven't been able to help. If you can connect us with someone senior and/or in management at Google Cloud, we'd be incredibly grateful. Reach out via email. Thanks!

    Onto the good stuff:

    Assistant Axis (Anthropic) is pretty interesting. Try to red-team it and watch the capped model resist drifting to harmful behaviors or sycophancy - it's not unbreakable of course, but I'm curious how easily it can be done. Warning that the "Isolation" example has a reference to self-harm, so please avoid that if you're sensitive to it.

    Gemma Scope 2 (Google DeepMind) is the sequel to the original Gemma Scope, now for Gemma 3. We have many of the dashboards and auto-interp labels up, but unfortunately we're still finishing this up for some models. We expect this to be done by Feb 14th, and we'll announce this again in the next blog post.

    We also feature initiatives by folks who do the real work behind the scenes to make Neuronpedia possible. David Chanin is our SAE whisperer, Michael Hanna is future-proofing circuit tracing by extending it to any Transformers model, and Jaden Fiotto-Kaufman and Clément Dumas' work on NNsight/NNterp powers many of our new inference backends. Our thanks to them all for dedicating their time and resources.

    Finally, it'll soon be Neuronpedia's 1000th day of existence. We'll be doing something cool for it. Stay tuned!

    In This Edition

    • 🤵🏻 Assistant Axis (Anthropic) ➡️ Launch - Monitoring and capping to the assistant persona.
    • 🔬 Gemma Scope 2 (Google DeepMind) ➡️ Launch - Gemma Scope for Gemma 3. WIP, ETA Feb 14th.
    • 🔍 SAELens (David Chanin) ➡️ GitHub - Toy SAEs, Matching Pursuit, Matryoshka, JumpReLU SAEs, v6.
    • ⚡️ Circuit Tracer (Michael Hanna) ➡️ GitHub - Support for any Transformer model, tests++.
    • 👁️ NNsight (Jaden Fiotto-Kaufman) ➡️ GitHub - Neuronpedia now uses nnsight for many models.

    🔊 Summary by NotebookLM - The Babble on [Apple] [Spotify]


    🤵🏻 Assistant Axis

    It's 10pm. Do you know if your model's drifting?

    It's 10PM. Do you know if your model's drifting? [Example]

    In collaboration with Anthropic, our interactive demo lets you visualize the difference between the default and "activation capped" Llama 70B, which is better at producing less harmful (trigger warning: self-harm), sycophantic, and jailbroken responses. Like any other Neuronpedia steering model, you can also use your own custom chats to red-team it and share the results.


    🔬 Gemma Scope 2 (ETA Feb 14)

    Gemma Scope 2 feature in gemma-3-27b-it about AI

    Gemma Scope 2 feature in gemma-3-27b-it about AI. [Example]

    In collaboration with Google DeepMind, our support for this sequel to Gemma Scope 1 is focused on all sizes of Gemma 3. While the weights have been finalized, we're still finishing uploading dashboards and autointerp labels. We'll include this in the next newsletter as well, after everything's complete (ETA Feb 14th).


    🔍 SAELens

    Training a Toy SAE on Synthetic Data with SAELens

    SAELens is Decode's library for training and using Sparse Autoencoders, actively developed by David Chanin. Since SAELens and Neuronpedia are the same organization, and Neuronpedia uses SAELens heavily, we'll now also cover SAELens updates in this blog, starting with these:


    ⚡️ Circuit Tracer

    circuit-tracer is Anthropic's open sourced library for generating and pruning attribution graphs. Thanks to the excellent work by Michael Hanna, Jaden Fiotto-Kaufman, and David Chanin for the v0.3.1 release, Neuronpedia can now do Gemma 3 circuit tracing. Details on the new version:

    • Use any Transformers model - circuit-tracer is no longer limited to select models - with the new nnsight engine support, you can now generate graphs for any model, as long as you create a simple mapping (instructions here).
    • Significantly improved test suite - Dozens of new tests ensure that attributions, interventions, and the new engine works as expected.

    👁️ NNsight

    nnsight is a library for interpreting and intervening on models, created by NDIF. Neuronpedia is now using nnsight for several models, including gpt-oss, Gemma 3, and Llama 3.3 70B.

    We're grateful to Jaden Fiotto-Kaufman and David Bau for their support and tireless work on nnsight, and also to Clement Dumas for his terrific work on nnterp, which makes it easy to access the the hook points of different models. We look forward to working more deeply with them in the future.


    As always, please contact us with your questions, feedback, and suggestions.