INDEX
    Explanations

    This neuron detects question-and command-starter tokens in math problems—words like “What,” “Divide,” “Calculate,” “Work,” “Round,” “Convert,” and similar prompt verbs.

    New Auto-Interp
    Negative Logits
    opus
    -0.07
    (!(
    -0.06
    шем
    -0.06
    ी।↵
    -0.06
    short
    -0.06
    ็นว
    -0.06
    €€
    -0.06
     embroidery
    -0.06
    typings
    -0.06
    ському
    -0.06
    POSITIVE LOGITS
    (__('
    0.06
    =back
    0.06
     cita
    0.06
    edral
    0.06
     ry
    0.06
     boo
    0.06
     hac
    0.06
    quartered
    0.06
    FillColor
    0.06
     있을
    0.06
    Act Density 0.013%

    No Known Activations