Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APICircuit TracerNEWSteerSAE EvalsExportsSlackBlogPrivacy & TermsContact
    © Neuronpedia 2025
    Privacy & TermsBlogGitHubSlackTwitterContact
    1. Home
    2. Andy Arditi · GPT-OSS BatchTopK SAEs
    3. GPT-OSS-20B
    4. Resid Post - 131k
    5. 11-RESID-POST-AA
    6. 30563
    Prev
    Next
    INDEX
    Explanations

    racial bias

    np_max-act · gemini-2.0-flash

    This neuron detects mentions of race and racial-group topics, especially content about racial identity, discrimination, representation, or related controversies.

    oai_token-act-pair · gpt-5-miniTriggered by @tatsatx

    Explanation could not be parsed.

    eleuther_acts_top20 · gpt-5-nanoTriggered by @tatsatx

    Explanation could not be parsed.

    eleuther_acts_top20 · gpt-5-miniTriggered by @tatsatx
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    andyrdt/saes-gpt-oss-20b/resid_post_layer_11/trainer_0
    Dataset (Dashboard)
    Various
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     leaf
    -0.08
     losse
    -0.08
     folos
    -0.08
     postfix
    -0.07
    roy
    -0.07
     borrowed
    -0.07
     topo
    -0.07
     אית
    -0.07
     overw
    -0.07
    ויה
    -0.07
    POSITIVE LOGITS
     racial
    0.20
    racial
    0.17
    黑人
    0.16
     racism
    0.16
     ethnic
    0.16
     ethnicity
    0.15
     racist
    0.15
     minorities
    0.14
     multicultural
    0.14
     LGBTQ
    0.14
    Activations Density 0.291%

    No Known Activations