INDEX
    Explanations

    This neuron activates on tokens that mark the start of sentences (e.g. initial words or markers at the beginning of each sentence).

    New Auto-Interp
    Negative Logits
     wor
    -0.07
    иболее
    -0.07
     dishonest
    -0.07
     Docs
    -0.07
    Reminder
    -0.07
    ificação
    -0.07
     Fakat
    -0.06
     “[
    -0.06
    _cash
    -0.06
    -0.06
    POSITIVE LOGITS
    Faces
    0.06
     Holds
    0.06
     scoop
    0.06
    (equal
    0.06
    zte
    0.06
     resign
    0.06
    uplicates
    0.06
     hứ
    0.06
    NA
    0.06
    rhs
    0.06
    Act Density 0.064%

    No Known Activations