INDEX
    Explanations

    The neuron fires on meta‐instructions specifying the assistant’s speaking style (e.g. directives about “speech,” “informal,” “should,” “respond,” etc.).

    New Auto-Interp
    Negative Logits
     aqui
    -0.06
    んでいる
    -0.06
    ored
    -0.06
     взрос
    -0.06
     encuent
    -0.06
     PendingIntent
    -0.06
     engr
    -0.06
    他們
    -0.06
    .shopping
    -0.06
     hypoc
    -0.06
    POSITIVE LOGITS
     walker
    0.07
    _gt
    0.07
    (&$
    0.07
    (lock
    0.07
     jams
    0.06
     repost
    0.06
    Translate
    0.06
    олом
    0.06
    ifth
    0.06
     Cleaning
    0.06
    Act Density 0.025%

    No Known Activations