INDEX
    Explanations

    explanations/confirmations

    This neuron detects polite acknowledgment phrases in the assistant’s replies, especially “Thank you for the additional information.”

    New Auto-Interp
    Negative Logits
     голови
    -0.08
    (fn
    -0.07
    hair
    -0.06
     район
    -0.06
    πλ
    -0.06
    ặng
    -0.06
    排名
    -0.06
    QUEST
    -0.06
     appId
    -0.06
    anal
    -0.06
    POSITIVE LOGITS
     erotische
    0.07
    (IService
    0.07
     Рус
    0.07
    	lbl
    0.07
     //$
    0.07
    irit
    0.07
    мотря
    0.06
    Tail
    0.06
    ]%
    0.06
    ")->
    0.06
    Act Density 0.014%

    No Known Activations