INDEX
    Explanations

    instructions

    This neuron detects the beginning of the assistant’s reply, especially greeting and speaker‐tag tokens (e.g. “assistant,” “Hello”).

    New Auto-Interp
    Negative Logits
    package
    -0.07
    IRD
    -0.06
    ávky
    -0.06
     например
    -0.06
    killer
    -0.06
     erv
    -0.06
    يق
    -0.06
    оки
    -0.06
    işti
    -0.06
    ery
    -0.06
    POSITIVE LOGITS
    (ball
    0.07
     BET
    0.06
    】,【
    0.06
    :&
    0.06
     fastball
    0.06
    BindingUtil
    0.06
     veto
    0.06
    .SetFloat
    0.06
    ině
    0.06
    _bio
    0.06
    Act Density 0.035%

    No Known Activations