INDEX
    Explanations

    instructions

    The neuron fires on tokens that introduce notes, cautions, or instructions (e.g. “Note,” “Ensure,” “Please,” “Although,” “While,” etc.), highlighting editorial or directive cues.

    New Auto-Interp
    Negative Logits
    (vs
    -0.08
    (le
    -0.07
    mind
    -0.07
    vy
    -0.07
    CSI
    -0.06
    appropriate
    -0.06
    (skip
    -0.06
     bitcoins
    -0.06
    (it
    -0.06
    fly
    -0.06
    POSITIVE LOGITS
     ciudad
    0.07
    0.07
     blackColor
    0.06
    /inc
    0.06
    -scripts
    0.06
    ../../
    0.06
    0.06
    illos
    0.06
    0.06
     Muj
    0.06
    Act Density 0.070%

    No Known Activations