INDEX
    Explanations

    punctuation marks

    The neuron flags tokens that are part of the assistant’s (AI’s) response segments, i.e. it detects when the text is coming from the assistant rather than the user.

    New Auto-Interp
    Negative Logits
    ades
    -0.06
    instances
    -0.06
    ạm
    -0.06
     heir
    -0.06
    questions
    -0.06
    .enabled
    -0.06
     Rates
    -0.06
     düşman
    -0.06
    radio
    -0.06
    -0.06
    POSITIVE LOGITS
    0.07
     QVERIFY
    0.07
    КТ
    0.07
    roveň
    0.07
    :@{
    0.07
    )/(
    0.06
    0.06
     bied
    0.06
    ाहत
    0.06
     smooth
    0.06
    Act Density 0.035%

    No Known Activations