© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    APIAssistant AxisNEWCircuit TracerNEWSteerSAE EvalsExports Community BlogPrivacy & TermsContact
    1. Home
    2. GPT2-Small
    3. Transcoders Residuals
    4. 8-TRES-DC
    5. 911
    Prev
    Next
    INDEX
    Explanations

    phrases indicating risk or danger to individuals or groups

    oai_token-act-pair · gpt-4o-miniTriggered by @bot
    New Auto-Interp
    Top Features by Cosine Similarity
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     Rise
    -0.76
    universal
    -0.70
     Courage
    -0.65
     flares
    -0.64
     Heights
    -0.62
    APS
    -0.62
    peat
    -0.62
    CLUS
    -0.62
    anth
    -0.61
    xon
    -0.60
    POSITIVE LOGITS
    aber
    0.77
    slave
    0.72
    serv
    0.66
     collateral
    0.65
     prey
    0.65
    zeb
    0.65
     expend
    0.64
     bait
    0.62
     partners
    0.62
    need
    0.61
    Activations Density 1.978%

    No Known Activations