INDEX
    Explanations

    Instructions and explanations

    The neuron isn’t looking for any particular words or concepts but rather responds to how far into the current text segment a token appears, with activation rising as you move toward the middle/end of a segment.

    New Auto-Interp
    Negative Logits
    td
    -0.07
    ером
    -0.06
    iren
    -0.06
    ackages
    -0.06
    .react
    -0.06
     add
    -0.06
    ipsoid
    -0.06
     motion
    -0.06
    ーマ
    -0.05
     etmiştir
    -0.05
    POSITIVE LOGITS
     dildo
    0.06
    عم
    0.06
    Clients
    0.06
     domác
    0.06
    	Optional
    0.06
     Sloan
    0.06
    _ASSOC
    0.06
     한번
    0.06
    ¯Â
    0.06
     ры
    0.06
    Act Density 1.831%

    No Known Activations