INDEX
    Explanations

    This neuron flags directive, second-person instructional language—specifically “you”-addressed commands and modal verbs (e.g. “will,” “would,” “tell you,” “simulate”) used to instruct the assistant.

    New Auto-Interp
    Negative Logits
    Isl
    -0.07
     focused
    -0.06
     gravitational
    -0.06
     comb
    -0.06
     actual
    -0.06
     نحو
    -0.06
    lında
    -0.06
     description
    -0.06
     substantially
    -0.06
    درس
    -0.06
    POSITIVE LOGITS
    \E
    0.06
    .setVisibility
    0.06
     ENERGY
    0.06
     coinc
    0.06
    0.06
    ("[
    0.06
     HUGE
    0.06
     oily
    0.06
    サイ
    0.06
     decid
    0.06
    Act Density 0.006%

    No Known Activations