INDEX
Explanations
The neuron fires on meta‐instructions specifying the assistant’s speaking style (e.g. directives about “speech,” “informal,” “should,” “respond,” etc.).
New Auto-Interp
Negative Logits
aqui
-0.06
んでいる
-0.06
ored
-0.06
взрос
-0.06
encuent
-0.06
PendingIntent
-0.06
engr
-0.06
他們
-0.06
.shopping
-0.06
hypoc
-0.06
POSITIVE LOGITS
walker
0.07
_gt
0.07
(&$
0.07
(lock
0.07
jams
0.06
repost
0.06
Translate
0.06
олом
0.06
ifth
0.06
Cleaning
0.06
Activations Density 0.025%