© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-27B-IT
    3. 53-GEMMASCOPE-2-RES-262K
    4. 47322
    Prev
    Next
    INDEX
    Explanations

    thoughtThe user wants me to explain the behavior of a neuron based on the provided lists. The explanation needs to be concise (3-20 words), a phrase, and identify a pattern or concept.Let's re-examine the lists:**MAX_ACTIVATING_TOKENS:**/s*thereachescuterr**TOKENS_AFTER_MAX_ACTIVATING_TOKEN:**Threatthoraxththickathroat)*ThoughtThought**TOP_POSITIVE_LOGITS:**ٹṭТThankտطтуторਤټ**TOP_ACTIVATING_TEXTS:**- "Hate/Threatening"- "thorax"- "thickest part of the thigh reads 165°F"- "reaches a threshold"- "cutthroat environments"- "err" (in code)- "Deep Thought"Let's try to find common threads.1. **"Threatening"**: Appears in `TOP_POSITIVE_LOGITS` (implicitly via the Arabic/Cyrillic characters that can be related to 'T' or 'Th' sounds, and 'Thank' is there too). 'Threat' is in `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`.2. **"thorax"**: Appears in `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`. This is a specific body part.3. **"thick"**: Appears in `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` ("thickest part").4. **"cutthroat"**: Derived from 'cut' in `MAX_ACTIVATING_TOKENS` and 'throat' in `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`.5. **"reaches"**: Appears in `MAX_ACTIVATING_TOKENS` and `TOP_ACTIVATING_TEXTS` ("reaches a threshold").6. **"Thought"**: Appears in `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` and `TOP_ACTIVATING_TEXTS` ("Deep Thought").7. **"err"**: Appears in `MAX_ACTIVATING_TOKENS` and `TOP_ACTIVATING_TEXTS` (code context).8. **TOP_POSITIVE_LOGITS**: Many of these are characters representing the 'T' or 'Th' sound across different scripts (Arabic, Cyrillic, Armenian, Gurmukhi, etc.). 'Thank' also fits this.The common element seems to be words starting with 'T' or 'Th', or sounds associated with them.- Threat (T)- thorax (Th)- thick (Th)- throat (Th)- reaches (R - but often followed by words starting with T, like "reaches the threshold")- cutthroat (C... T)- Thought (Th)- err (E - code related, less thematic)The `TOP_POSITIVE_LOGITS` are heavy on characters related to 'T' or 'Th'.The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` also often start with 'T' or 'Th' ('Threat', 'thorax', 'th', 'thick', 'throat', 'Thought').The pattern seems related to words starting with the letter 'T', or the sound 'Th'. Many of these words have a more severe or definitive meaning in the given texts: Threat, threshold, cutthroat, Deep Thought (implies a significant concept).Let's try to synthesize this.- "t or th" - too simple, not specific enough.- "t-words or th-words" - better, but still a bit descriptive.Considering the examples:- threat- thorax- thick- throat- reaches threshold- cutthroat- Deep ThoughtThese are all words starting with common 'T' or 'Th' sounds. The `TOP_POSITIVE_LOGITS` reinforce this by showing many characters that represent these sounds.The phrase should capture this. t or th words

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-27b-it/resid_post/layer_53_width_262k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     thé
    0.86
     thee
    0.74
     θε
    0.73
     ther
    0.72
     Theo
    0.70
     Thé
    0.69
     Θε
    0.69
     theses
    0.68
     Theodore
    0.66
     therm
    0.65
    POSITIVE LOGITS
    ٹ
    0.51
    ṭ
    0.47
    Т
    0.47
    Thank
    0.46
    տ
    0.42
    ط
    0.42
    ту
    0.41
    тор
    0.41
    ਤ
    0.41
    ټ
    0.40
    Activations Density 0.022%

    No Known Activations