INDEX
    Explanations

    punctuation

    This neuron detects assistant-generated text (tokens marking the assistant's responses).

    New Auto-Interp
    Negative Logits
    -0.07
     lava
    -0.07
     stitched
    -0.07
    _modifier
    -0.07
    -0.07
    哪裡
    -0.07
     besten
    -0.07
     משמעות
    -0.07
    spd
    -0.07
     Venus
    -0.07
    POSITIVE LOGITS
    0.07
     nw
    0.07
    0.07
    Rule
    0.06
    .Types
    0.06
    Res
    0.06
    Campaign
    0.06
    דבר
    0.06
    起到了
    0.06
    0.06
    Act Density 0.125%

    No Known Activations