INDEX
    Explanations

    common English words

    This neuron is effectively “dead” – it never activates and so isn’t detecting any pattern in the text.

    New Auto-Interp
    Negative Logits
     zijn
    -0.07
    Ubergraph
    -0.07
     Cater
    -0.07
     responsibilities
    -0.07
     Removes
    -0.07
     putas
    -0.07
    -0.06
     ull
    -0.06
     exchange
    -0.06
     ruins
    -0.06
    POSITIVE LOGITS
    iệng
    0.06
     disagrees
    0.06
    ことで
    0.06
    ystick
    0.06
    (EXIT
    0.06
    .mouse
    0.06
     edil
    0.06
    NZ
    0.06
     Route
    0.06
    recursive
    0.06
    Act Density 0.058%

    No Known Activations