INDEX
    Explanations

    The neuron lights up on action‐oriented instruction words—particularly verbs that direct steps in a procedural or advisory context.

    New Auto-Interp
    Negative Logits
    _hw
    -0.07
    naire
    -0.07
    tabla
    -0.07
     whe
    -0.07
    _marker
    -0.06
     maxWidth
    -0.06
     tak
    -0.06
    ungal
    -0.06
    ε
    -0.06
    _magic
    -0.06
    POSITIVE LOGITS
    .What
    0.07
    0.06
     anom
    0.06
     слишком
    0.06
    ustom
    0.06
     Awards
    0.06
    .Man
    0.06
     "','
    0.06
    0.06
    شف
    0.06
    Act Density 0.235%

    No Known Activations