INDEX
Explanations
explanations/confirmations
This neuron detects polite acknowledgment phrases in the assistant’s replies, especially “Thank you for the additional information.”
New Auto-Interp
Negative Logits
голови
-0.08
(fn
-0.07
hair
-0.06
район
-0.06
πλ
-0.06
ặng
-0.06
排名
-0.06
QUEST
-0.06
appId
-0.06
anal
-0.06
POSITIVE LOGITS
erotische
0.07
(IService
0.07
Рус
0.07
lbl
0.07
//$
0.07
irit
0.07
мотря
0.06
Tail
0.06
]%
0.06
")->
0.06
Activations Density 0.014%