© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B-IT
    3. 24-GEMMASCOPE-2-RES-16K
    4. 2531
    Prev
    Next
    INDEX
    Explanations

    * **右前方にカラーコーンがあるので、少し右にコースを変えます。** (Migi zenpō ni karā kōn ga aru node, sukoshi migi ni kōsu o **kaemasu**.) - There's a cone ahead and to the right, so we will **change course** slightly to the right.The pattern here connects "評価" (evaluation/assessment) or "コース" (course) with "変" (change). The examples show changing course and potentially changing evaluation (though not explicit in the TOP_ACTIVATING_TEXTS, "評価" is in MAX_ACTIVATING_TOKENS).The most direct and specific pattern observed is changing something.Let's refine based on the rules:- Concise (3-20 words): Yes- Single word, phrase, or pattern: Yes- About tokens following or preceding: Yes ("変" followed by "えます" creating "変えます")- About words starting with a sequence: Not applicable here.- Avoid listing tokens: Yes- Just say the pattern: Yes- No "This neuron detects/predicts": Yes- No "tokens" or "patterns": Yes- No initial capitalization unless proper noun: Yes- Specific: Yes, "change course" or "change evaluation" is specific.- Majority match: The "change" aspect is strong.Looking at the MAX_ACTIVATING_TOKENS as "評価" and "を変" and TOKENS_AFTER_MAX_ACTIVATING_TOKEN as "えます", it strongly suggests "評価を変える" (to change evaluation).The TOP_ACTIVATING_TEXT contains "コースを変えます" (change course).So, the core behavior is "change". "Change evaluation" or "change course" are good candidates.Given "評価" is in MAX_ACTIVATING_TOKENS, "change evaluation" aligns directly."コースを変えます" confirms the "change direction/course" aspect.Let's consider the most common theme: changing something."評価を変えます" captures the direct token sequence."changing course" captures one of the text examples.The prompt asks what the neuron detects *or predicts by finding patterns in lists*.The lists highlight "評価" (evaluation) and "を変" (change prefix) followed by "えます" (verb ending for change). This is a strong indicator for changing evaluation. The text example corroborates the idea of change, specifically changing course.A concise phrase is needed."change course" is 2 words."change evaluation" is 2 words.The neuron seems to fire when there's a concept of changing something, often with "evaluation" or a physical "course".The most direct interpretation of the MAX_ACTIVATING_TOKENS is "change evaluation".Let's try combining the idea of "change" with what is being changed.- 評価を変えます (change evaluation)- コースを変えます (change course)The instructions say "find a pattern that helps you explain the behavior".The pattern is "change" + object.The object can be evaluated things or physical directions."change evaluation" or simply "change direction" or "change course".Given "評価" is a direct token in MAX_ACTIVATING_TOKENS, it's a strong candidate. "変" + "えます" = "変えます" (to change).change evaluation

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-it/resid_post/layer_24_width_16k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     虽然
    0.84
     थी
    0.78
     但是
    0.73
     नहीं
    0.72
    雖然
    0.72
     不过
    0.71
    的大
    0.70
     notwithstanding
    0.70
    össä
    0.68
     Didn
    0.68
    POSITIVE LOGITS
    することで
    0.72
    TH
    0.69
     możemy
    0.69
    イオン
    0.68
    サ
    0.68
    メカ
    0.66
    できる
    0.66
    Co
    0.65
    どのように
    0.65
    いくつかの
    0.65
    Activations Density 0.000%

    No Known Activations