INDEX
    Explanations

    referencing previous information

    The neuron fires on self-referential cue phrases about the assistant’s prior remarks (e.g. “mentioned in my previous response,” “above”).

    New Auto-Interp
    Negative Logits
     reminders
    -0.08
    Message
    -0.07
    unknown
    -0.06
    NoSuch
    -0.06
     Teddy
    -0.06
    ंदर
    -0.06
     carc
    -0.06
     cancellationToken
    -0.06
    .rd
    -0.06
     button
    -0.06
    POSITIVE LOGITS
    SCO
    0.07
    0.06
     تل
    0.06
     Auswahl
    0.06
     crumbling
    0.06
    ução
    0.06
     Illustrated
    0.06
     호텔
    0.06
     AUX
    0.06
     фор
    0.06
    Act Density 0.131%

    No Known Activations