INDEX
    Explanations

    The neuron steadily increases its activation the further into a generated or quoted text it moves, effectively detecting “later” or “deep” positions in the token sequence.

    New Auto-Interp
    Negative Logits
     psychologists
    -0.07
     syn
    -0.07
     transformer
    -0.06
     anti
    -0.06
     영국
    -0.06
    yon
    -0.06
    actal
    -0.06
     Streams
    -0.06
    65
    -0.06
    823
    -0.06
    POSITIVE LOGITS
     maxx
    0.08
    ../
    0.07
    .ms
    0.06
    ине
    0.06
     Redistributions
    0.06
    register
    0.06
    итай
    0.06
    following
    0.06
    contrib
    0.06
    .hide
    0.06
    Act Density 0.067%

    No Known Activations