The Residual Stream

    Neuronpedia's Blog

    Llama 3.3 70B, Temporal Feature Analysis, Featured Research, gpt-oss-20b, and *a lot more*

    Llama 3.3 70B, Temporal Feature Analysis, Featured Research, gpt-oss-20b, and *a lot more*

    We Waited Too Long to Release an Update Post πŸ˜…
    By Johnny Lin Β· November 12th, 2025

    In This Edition

    πŸ”Š Summary by NotebookLM - The Babble on [Apple] [Spotify]


    πŸ¦™ Llama 3.3 70B Instruct

    By popular demand, we're supporting our biggest model yet, just in time for your ICLR rebuttals! These have initial support for Goodfire's SAE.

    • How to Use
      • Use the Neuronpedia package: pip install neuronpedia
      • Or use the API with parameters:
        • modelId = "llama3.3-70b-it" | source/layer = "50-resid-post-gf" | sourceSet = "resid-post-gf"
    • Status
      • βœ… Supported Functionality (See Cookbook)
        • Dashboards + Autointerp Labels ➑️ Release Page
        • Activations for a single feature with custom prompt (API)
        • Top activating features from all features with custom prompt (API)
        • Top activating features by token from all features with custom prompt (API)
        • NEW: Batch Inference - send up to 4 prompts (256 tokens max per prompt) in one request (use array of prompts in API)
        • Steering is now enabled as of Nov 22nd
        • All other APIs are now available - Search via Inference, TopK by Token, etc.
    • πŸ“« Feedback/Requests - What else do you want this API to do?

    ⏰ Temporal Feature Analysis (Lubana, Rager, Hindupur, et al.)

    ➑️ Open Demo Interface | Release | Tweet

    • Background - Temporal Feature Analysis (paper, notebook) is an interpretability method designed to capture both contextual and local information in language model representations.
    • βœ… Supported Functionality
      • Similarity Matrix Visualization
        • Custom text (example) or use a demo preset (example)
        • Hover over tokens or cells to see the corresponding cell or token
        • Link directly to your custom similarity matrix by clicking "Share"
      • Supported Models
      • Steering Enabled

    πŸ€– gpt-oss-20b

    To kick off support for gpt-oss-20b, we have BatchTopK SAEs by Andy Arditi.

    • Release ➑️ neuronpedia.org/gpt-oss-sae
      • Dashboards (chat + non-chat MAEs) + autointerp labels
      • These are the trainer_0 in the Huggingface repo
    • Status
      • βœ… Most API calls supported, including inference
      • ❌ Steering (ETA: Nov 14th - 15th)
      • NEW: Dashboards now have a new toggle: "Show Raw Tokens" (displays special tokens) vs "Show Formatted" (formats special tokens into role and content).

    😈 Misaligned Persona in Open Weight Models

    Arditi and Chen's reproduction of Misaligned Persona, for Qwen and Llama. [LessWrong Post]


    πŸ”¬ Research using Neuronpedia: Automated Circuit Interpretation via Probe Prompting

    Giuseppe Birardi

    Preprint | LW Post | Interactive Demo | Github Repo

    Attribution Graph Probing automates feature interpretation with probe-prompting, turning CLT attribution graphs into compact subgraphs of concept-aligned supernodes validated via Neuronpedia Replacement/Completeness (β‰ˆ0.54/0.83). On US-capitals circuits it beats geometric clustering on interpretability and reveals an early–vs–late layer split (early layers generalize; ; late Say-X specialize).

    The demo is mostly push-button, and the repo makes it easy to scale to many graphs.

    Highlighted automated subgraphs:


    πŸ“ˆ Attribution Graph Updates

    • NEW: Zoom and pan the subgraph using the buttons on the bottom left, or pinching / click and drag
    • NEW: Graph and subgraph replacement and completeness scores, at the bottom of the subgraph
    • NEW: Qwen3-4B - Steering from graph now enabled
    • NEW: "Remix" Graph
      • Click "Remix" to edit the current prompt for a graph and generate a new graph from it.
      • For Gemma 2 2B, you can also switch between using the Gemma Scope Transcoders and the Anthropic Fellows' Cross Layer Transcoders to quickly compare the difference.
    • NEW: Download the labels along with the graph - click Info, then "Download Graph JSON + Labels"
    • NEW: Simplified UI for graph generation (click "Advanced" for other adjustments)

    🏷️ Gemma Scope Auto-Interp Label Refresh + 27B Dashboards

    • NEW: Added dashboards for 3 layers of Gemma 2 27B

    • REFRESH: We've refreshed many Gemma Scope autointerp labels, using a new General Explainer designed/tested with thinking models (like Gemini 2.5 Flash) for multiple models, layers, and widths:

      ModelHook and WidthLayersExample
      gemma-2-27b (New!)gemmascope-res-131k (New!)10, 22, 34Link
      gemma-2-2bgemmascope-res-16k16, 18, 20, 22, 24Link
      gemma-2-2bgemmascope-res-65k16, 18, 20, 22, 24Link
      gemma-2-9bgemmascope-res-16k24, 26, 28, 30, 32Link
      gemma-2-9bgemmascope-res-131k24, 26, 28, 30, 32Link
      gemma-2-9b-itgemmascope-res-16k9, 20, 31Link
      gemma-2-9b-itgemmascope-res-131k9, 20, 31Link

    As with all data on Neuronpedia, exports including dashboards, labels, etc are available publicly.


    🎁 Other Updates

    • There's now an Available Resources page to see what models and sources are available for inference.
    • You may have noticed this post is "bullet point"-heavy, with fewer graphics and paragraphs. We hope this helps us write update posts more efficiently so that we can update more frequently rather than a ton of updates in one post.
    • It was Neel Nanda's birthday on Monday - our thanks for his awesome work and support.
    • I guess we did a lot of auto-interp:

    https://neuronpedia.s3.us-east-1.amazonaws.com/site-assets/blog/a-lotta-autointerp.jpg

    As always, please contact us with your questions, feedback, and suggestions.