INDEX
    Explanations

    questions/discussions

    This neuron activates on tokens within detailed, step-by-step explanatory or elimination reasoning sections of the assistant’s answers.

    New Auto-Interp
    Negative Logits
     isc
    -0.07
     rop
    -0.07
     exam
    -0.07
    -0.06
    isse
    -0.06
    .gwt
    -0.06
     Pandora
    -0.06
     addObserver
    -0.06
    airro
    -0.06
     losses
    -0.06
    POSITIVE LOGITS
    -interface
    0.08
    >You
    0.06
    การท
    0.06
    ?>/
    0.06
    +m
    0.06
    ۱۵
    0.06
     ~
    0.06
    //
    ↵
    0.06
    //--------------------------------------------------------------------------------
    0.06
    จะต
    0.06
    Act Density 0.030%

    No Known Activations