INDEX
Explanations
instructions
This neuron detects the beginning of the assistant’s reply, especially greeting and speaker‐tag tokens (e.g. “assistant,” “Hello”).
New Auto-Interp
Negative Logits
package
-0.07
IRD
-0.06
ávky
-0.06
например
-0.06
killer
-0.06
erv
-0.06
يق
-0.06
оки
-0.06
işti
-0.06
ery
-0.06
POSITIVE LOGITS
(ball
0.07
BET
0.06
】,【
0.06
:&
0.06
fastball
0.06
BindingUtil
0.06
veto
0.06
.SetFloat
0.06
ině
0.06
_bio
0.06
Activations Density 0.035%