© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    1. Home
    2. Qwen3-1.7B
    3. 26-LLAMASCOPE-2-LORSA-16K-K64
    4. 607
    Prev
    Next
    INDEX
    Explanations

    <THINKING> Method 1: MAX_ACTIVATING_TOKENS show patterns centered on “present‑day Pakistan”, “current day Pakistan”, “Indian states”, etc., but no simple source→target token mapping. Method 2: TOKENS_AFTER_MAX_ACTIVATING_TOKEN are largely geographic names and adjectives related to India/Pakistan (India, Indian, Punjab, etc.), indicating a focus on that region. Method 3: TOP_POSITIVE_LOGITS include “Indian” and other Indic‑related tokens, reinforcing the region focus. TOP_NEGATIVE_LOGITS list unrelated countries (Hawaii, Kenya, etc.), confirming suppression of non‑South‑Asian terms. Conclusion: Neuron activates on mentions of the Indian subcontinent (India, Pakistan, related regions). Method used: 2 (primary) / 3 (support). Explanation: Indian subcontinent </THINKING> Indian subcontinent

    unknown · unknown
    New Auto-Interp
    Top Features by Cosine Similarity
    Embeds
    IFrame
    Link
    Not in Any Lists

    No Comments

    Negative Logits
     Hawai
    -25.63
    儋
    -24.38
    菲律宾
    -23.88
     Kenya
    -22.75
     Manila
    -22.25
     Hawaiian
    -22.00
    东盟
    -21.75
     Hawaii
    -21.38
     Singapore
    -21.25
     Hait
    -21.25
    POSITIVE LOGITS
     IPP
    17.75
    જ
    17.50
    Indian
    17.38
    (PC
    16.88
    [ix
    16.75
    省
    16.50
    邢台
    16.25
    省政府
    16.00
    CPP
    15.88
    省内
    15.88
    Activations Density 0.133%

    No Known Activations