INDEX
    Explanations

    This neuron is effectively “dead” (it never activates for any input tokens).

    New Auto-Interp
    Negative Logits
     endeavors
    -0.08
     decorators
    -0.07
     fz
    -0.06
    orent
    -0.06
     fsm
    -0.06
    izen
    -0.06
     Lightweight
    -0.06
    ологіч
    -0.06
    fir
    -0.06
     switched
    -0.06
    POSITIVE LOGITS
    _FAILED
    0.07
    ΡΑ
    0.07
    0.06
     maximizing
    0.06
    0.06
     складу
    0.06
    ()");↵
    0.06
     보면
    0.06
    *:
    0.06
     úspěš
    0.06
    Act Density 0.076%

    No Known Activations