INDEX
    Explanations

    The neuron activates on sentence‐initial tokens—especially capitalized transition words (e.g. “In,” “Next,” “This,” “That’s”) that start new sentences.

    New Auto-Interp
    Negative Logits
     mockery
    -0.07
     EN
    -0.07
    _FINAL
    -0.06
    (){}↵
    -0.06
     быть
    -0.06
     coupling
    -0.06
    ________________________________________________________________
    -0.06
     +
    ↵
    -0.06
     peanut
    -0.06
    .weixin
    -0.06
    POSITIVE LOGITS
     клуб
    0.07
    0.06
    stdafx
    0.06
     titular
    0.06
    0.06
     Prahy
    0.06
     непосред
    0.06
    0.06
     Dickinson
    0.06
     kodu
    0.06
    Act Density 0.192%

    No Known Activations