© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Gemma-3-12B-IT
    3. 31-GEMMASCOPE-2-RES-16K
    4. 5731
    Prev
    Next
    INDEX
    Explanations

    * **Modern x86 Assembly Language Programming*** **Modern Family (ABC)*** **ur_modern_driver** ROS package* **standard modern studio headphones*** **Modern Standard Arabic - MSA*** **modern slang*** **modern systems for monitoring*** **modern saress*** **modern appliance*** **modern woman**The common thread is the word "modern" followed by a noun or descriptive term. The neuron seems to recognize the concept of "modern" in various contexts, often preceding specific types of things (family, slang, systems, appliance, woman, etc.).Let's select the most concise and representative phrase. "modern" is the core. The following words are the subjects of "modern".Possible explanations:- modern [noun]- modern things- modern concepts- modern usage- modern examplesThe prompt asks for a phrase that captures what the neuron *detects or predicts by finding patterns in lists*.The explicit token is "modern".The top positive logits are about "modern", "modernos", "day", "conveniences".The activating texts show "Modern x86", "Modern Family", "ur_modern_driver", "standard modern studio headphones", "Modern Standard Arabic", "modern slang", "modern systems", "modern saress", "modern appliance", "modern woman".The most direct and specific pattern is the presence of the word "modern" and what it modifies or precedes.Considering the rules:- Concise (3-20 words): Yes.- Phrase, not a full sentence: Yes.- Captures what neuron detects/predicts: Yes, it detects the usage of "modern".- Finds patterns in lists: Yes.- Specific: Yes.- Avoids "tokens", "patterns", "detects/predicts": Yes.- No capitalization unless proper noun: Yes.Let's look at the `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`.x, Family, _, studio, Standard, slang, systems, sa, appliance, woman.These are all *types* of things described as modern.The `TOP_POSITIVE_LOGITS` are: conveniences, কালে, izr, izing, day, modern, Modern, কালের, day, modernos.The most frequent and relevant ones are "modern", "Modern", "modernos", "day".The phrase needs to capture the *behavior* of the neuron.It's strongly activated by the word "modern".It's strongly associated with words like "day", "conveniences", "modernos".It appears in contexts describing modern versions of things or modern concepts.How about framing it as detecting current or recent things?"modern" is the key.Let's try to simplify based on the core token and what it's associated with in the output.The phrase should be descriptive of what the neuron *represents*."modern" is the most apparent signal. The subsequent tokens are diverse but often nouns.If I just say "modern", it's too short (1 word, but the rule says 3-20 words)."modern and related terms" - too generic.Let's revisit `TOKENS_AFTER_MAX_ACTIVATING_TOKEN`.These words *follow* "Modern" or "modern".Family, studio, slang, systems, appliance, woman. These are all concrete or conceptual categories.What if the neuron is detecting "modern" followed by a noun or category?Phrases like:- modern + noun- modern categories- types of modern thingsLet's look at `TOP_POSITIVE_LOGITS` again.conveniences, day, modern, Modern, modernos."conveniences" and "day" are words that can describe things, or concepts of time. "modernos" is a plural of modern.The examples in `TOP_ACTIVATING_TEXTS` are quite diverse:- Modern x86 Assembly Language Programming- Modern Family- ur_modern_driver- standard modern studio headphones- Modern Standard Arabic- modern slang- modern systems- modern saress- modern appliance- modern womanIt's about "modern" + a specific noun related to a domain (tech, entertainment, language, fashion, household).Perhaps the neuron is about "modern" *usage* or "modern" *versions*?"modern versions of things" - a bit long, but captures it.Let's refine.The neuron identifies the concept `modern`.The top logits are `conveniences`, `day`, `modern`, `modernos`. This suggests a focus on recent times or the present.The `TOKENS_AFTER_MAX_ACTIVATING_TOKEN` give specific examples of what 'modern' modifies.Let's try to combine the core word "modern" with the idea of *what* is modern."modern and its applications" - maybe too broad."modern topics and concepts" - better.What about the specific examples?Family, Studio, Standard, Slang, Systems, Appliance, Woman.These are all nouns.A very direct explanation is just "modern" and the types of things it precedes."modern terms and items" - could work.Let's consider specific domains.- Tech: x86 Assembly, driver, systems, appliance- Entertainment: Family- Language: slang, Arabic- Fashion: saressThe neuron signals the use of "modern" when referring to contemporary examples or specific domains."modern specific examples""modern specific domains""modern examples and concepts"Let's look at the `TOP_POSITIVE_LOGITS` again.`conveniences` -> modern amenities`day` -> modern times / contemporaryIt seems to be about identifying "modern" and its context/application.The phrase should be short and punchy.If the neuron detects "modern" followed by a noun, phrases like "modern noun types" or "modern specific nouns" come to mind.But the logits also contain "conveniences", "day", "modern", "modernos".Let's consider the possibility of it being about *contemporary* things in general."current topics and examples""contemporary usage and items"Given `MAX_ACTIVATING_TOKENS` is almost exclusively "modern", the neuron is HIGHLY sensitive to this token.The other lists inform *what kind* of modern it is associated with.Let's go with a phrase that includes "modern" and hints at its context."modern concepts and items""modern terms and categories""modern examples and domains"Re-reading the rules: "The explanation could be about words starting with a sequence."The neuron is strongly activated by "Modern".The tokens after are diverse nouns.How about focusing on the *types* of things described as modern?"modern product and concept types" - too long.Let's simplify."modern specific categories" - implies modern + noun/category.Consider the core, repeated word "Modern".What follows are varied categories.TOP_POSITIVE_LOGITS are "conveniences", "day", "modern", "modernos".This suggests `modern` + [noun], and also `modern` as a descriptor of general present-day concepts or

    np_acts-logits-general · gemini-2.5-flash-lite
    New Auto-Interp
    Top Features by Cosine Similarity
    Configuration
    google/gemma-scope-2-12b-it/resid_post/layer_31_width_16k_l0_medium
    Prompts (Dashboard)
    238,145 prompts, 512 tokens each
    Dataset (Dashboard)
    lmsys + oasst1
    No Configuration Found
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
    йде
    0.67
    当前
    0.62
     longstanding
    0.62
     '{}
    0.61
     toplam
    0.60
    \|^
    0.60
    름
    0.59
    ટ
    0.59
     current
    0.58
     kivy
    0.58
    POSITIVE LOGITS
     conveniences
    0.84
    কালে
    0.82
    izr
    0.80
    izing
    0.78
    day
    0.77
    modern
    0.77
    Modern
    0.75
    কালের
    0.73
     day
    0.71
     modernos
    0.71
    Activations Density 0.061%

    No Known Activations