INDEX
    Explanations

    This neuron reliably lights up on the very first content word of a new article or section (e.g. the opening “The,” “Museum,” “Population,” etc.), marking the start of a topic segment.

    New Auto-Interp
    Negative Logits
    ega
    -0.07
    olvable
    -0.06
    िलत
    -0.06
    ати
    -0.06
    cka
    -0.06
    illon
    -0.06
     Cit
    -0.06
     setback
    -0.06
    िथ
    -0.06
    _ct
    -0.06
    POSITIVE LOGITS
    Dire
    0.07
     conjug
    0.07
     rightfully
    0.07
    …”
    0.06
    /hash
    0.06
    003
    0.06
    .rs
    0.06
     boring
    0.06
     recv
    0.06
     topics
    0.06
    Act Density 0.083%

    No Known Activations