INDEX
    Explanations

    The neuron fires on frequent function words and other common “structural” tokens (articles, conjunctions, simple verbs, and prepositions) rather than on domain-specific content.

    New Auto-Interp
    Negative Logits
     buzz
    -0.07
    -0.07
    LARI
    -0.07
     apologies
    -0.06
    ...'
    -0.06
    .Chart
    -0.06
    ificates
    -0.06
     فقط
    -0.06
     genotype
    -0.06
    ECH
    -0.06
    POSITIVE LOGITS
    ngine
    0.07
    mie
    0.07
     (![
    0.07
     приб
    0.07
    0.07
    0.07
    (dirname
    0.07
    WIN
    0.06
     المن
    0.06
     layui
    0.06
    Act Density 0.039%

    No Known Activations