INDEX
    Explanations

    This neuron fires on the gerund “responding” (as in “responding to”) appearing in instruction phrases.

    New Auto-Interp
    Negative Logits
    Cast
    -0.06
    -0.06
    incy
    -0.06
    111
    -0.06
     jobs
    -0.06
     Ung
    -0.06
     baked
    -0.06
    Segoe
    -0.06
    -0.06
     beds
    -0.06
    POSITIVE LOGITS
     nullptr
    0.07
     Canadian
    0.07
     freshness
    0.07
     empower
    0.07
    ](↵
    0.07
    писание
    0.07
     impacting
    0.07
    oubted
    0.06
    strlen
    0.06
     noisy
    0.06
    Act Density 0.018%

    No Known Activations