INDEX
    Explanations

    The neuron detects meta-instructions or system-level directives telling the assistant how to format or complete its answer.

    New Auto-Interp
    Negative Logits
     burada
    -0.07
    _end
    -0.07
    biased
    -0.07
     phối
    -0.07
    uran
    -0.06
    achte
    -0.06
    .books
    -0.06
    vail
    -0.06
    assigned
    -0.06
    ises
    -0.06
    POSITIVE LOGITS
     Kingdom
    0.07
     kab
    0.07
     blended
    0.06
     lstm
    0.06
    _TBL
    0.06
     microsoft
    0.06
    REM
    0.06
    0.06
     (\<
    0.06
     chocolate
    0.06
    Act Density 0.006%

    No Known Activations