Neuronpedia is the free and open platform for interpretability research. Search, test, explore, and upload your data.
    Search + Test
    Searching 660,882 Features in GPT2-SMALL
    Choose an Example Search
    Browse Data
    By Model
    By Dataset
    By Layer
    references to historical or biblical figures and events
    TOP Activations
    for a bottle of Pliny the Elder," Cilurzo
    Iran from the point that Cyrus the Great liberated the people from slavery
    own will, as the Son of God; See Gill on Matthew

    No Comments

    Recent Activity
    duck-master commented on gpt2-small@6:2800
    this is a super confusing/polysemantic neuron
    jbloomaus commented on gpt2-small@11-res-jb:2585
    Fires when Predictions of Tech Companies as the next token are appropriate.
    jbloomaus commented on gpt2-small@9-res-jb:454
    A limited set of compound adjectives most related to specific phrases like laid-back, sold-out, grown-up (second term is often direction or spatial).
    jbloomaus commented on gpt2-small@4-res-jb:3888
    abbreviations and acronyms in governmental and trade contexts, fires on the first token after an open bracket and sometimes a little bit on the token before or after.
    jbloomaus commented on gpt2-small@8-res-jb:1313
    Text likely to appear in d&d character creations.
    duck-master starred gpt2-small@6:904
    jbloomaus starred gpt2-small@9-res-jb:16399
    jbloomaus starred gpt2-small@9-res-jb:548
    duck-master starred gpt2-small@6:2800
    What's AI Interpretability?
    Today, hundreds of millions of people are using AIs like ChatGPT, but nobody, not even the engineers that created them, knows exactly how they think, or how to reliably steer them to not be harmful to humans. This is because modern AIs were created through a process similar to evolution, using extremely powerful computers. Understanding how AIs think is the field of interpretability, and steering them is alignment. Both are important for AI safety.
    AI safety? Are you trying to take away my GPUs?
    No. We want to help understand and align extremely powerful AI, since there is no guarantee that the AI will be friendly to humanity.
    Why Neuronpedia?
    The goal of Neuronpedia is to be the "Wikipedia for AI Interpretability". But why is this needed?
    • Automatic Frontend for Your Data: Researchers are busy enough doing research. They shouldn't have to scramble together a frontend to make their work presentable/usable before publishing their paper. Neuronpedia wants to make publishing your interpretability research as easy as clicking Upload.
    • One Central, Standardized, Searchable Database: Interpretability researchers have usually created their own custom websites to upload and display their data. For example, OpenAI's Neuron Viewer and Neel Nanda's Neuroscope have similar types of data, but totally different interfaces, APIs, and data structures. Neuronpedia is the free and open website where all the interpretability data can be uploaded, searched, compared, linked to, exported, tested, etc.
    • Crowdsourcing and Peer Testing: Neuronpedia is used by (and created by!) people who are fascinated by AI interpretability. Neuronpedia's tools and game allow anyone to contribute to explaining, verifying, and analyzing your data. Someone might even find interesting things in your data that you didn't notice! (And vice versa.)
    Who does Neuronpedia benefit?
    If AI alignment goes well, then Neuronpedia benefits everyone. But most immediately, Neuronpedia benefits interpretability researchers and those who are curious about the inner workings of AI models. As a project that's mostly supported by short-term grants from nonprofits, Neuronpedia is also committed to openness and does not sell data - all data uploaded and contributed is free to use. If you'd like a specific data export, please contact us.
    Do you support directions/features?
    Yes, we do - including support for browsing, viewing, testing, and uploading directions/features. We currently have directions from OpenAI, Joseph Bloom, and Cunningham et al.
    Who are you?
    Neuronpedia is created by Johnny Lin - I'm an ex-Apple engineer who previously founded a privacy startup. AI is fascinating - and the impact it will have should not be underestimated.
    How can I help?
    • Researchers: Use Neuronpedia for your models and directions. Email us and let us know what would be the most useful features for you. APIs? New ways to test neurons? New visualizations? Plugins?
    • Everyone Else: Poke around AI brains. Find cool and interesting neurons. Comment and star them.
    • The Wealthy: Donate to Neuronpedia - don't make us put up a Jimmy Wales-style banner at the top of every neuron.
    The Neuronpedia game (crowdsourced understanding of AI) is currently less actively maintained, as we are focused on building out research features.
    Beating GPT-4
    Verifications (Digs)
    How It Works
    1. Explain
    Crowdsource explanations for the neurons inside AI.
    2. Verify
    Verify explanations that other people have submitted.
    3. Understand AI
    Open data for interpretability, alignment, and safety projects.