INDEX
    Explanations

    forms of "to be"

    This neuron fires on the assistant’s standard self-introduction/help offers—i.e. phrases like “I’m here to help/assist you.”

    New Auto-Interp
    Negative Logits
    _reward
    -0.07
    Independent
    -0.06
     brittle
    -0.06
    -0.06
     Chic
    -0.06
    Pas
    -0.06
    isinin
    -0.06
    Tes
    -0.06
    Spin
    -0.06
     gm
    -0.06
    POSITIVE LOGITS
     exacerb
    0.07
     atheist
    0.06
    kB
    0.06
     escalated
    0.06
    Ž
    0.06
    -ready
    0.06
     gehen
    0.06
     буд
    0.06
     lda
    0.06
    odable
    0.06
    Act Density 0.013%

    No Known Activations