INDEX
    Explanations

    historical events

    This neuron consistently lights up on non-English (Romance-language) tokens—i.e. words or subwords from Italian/Portuguese text—indicating it’s detecting when the text switches out of English.

    New Auto-Interp
    Negative Logits
    _STENCIL
    -0.07
     commitments
    -0.07
     dm
    -0.07
     سود
    -0.07
    Zend
    -0.07
    061
    -0.06
     promoters
    -0.06
     мор
    -0.06
    /reset
    -0.06
    _HELPER
    -0.06
    POSITIVE LOGITS
    (blank
    0.07
    .onerror
    0.06
     #-}↵↵
    0.06
    ویش
    0.06
    buffer
    0.06
    \"\
    0.06
    [next
    0.06
    ULONG
    0.06
    '''
    ↵
    0.06
    ("--
    0.06
    Act Density 0.033%

    No Known Activations