INDEX
Explanations
The neuron activates on tokens that introduce procedural instructions or advice—especially words like “start,” “by,” and “asking” that cue a step or method.
New Auto-Interp
Negative Logits
आपक
-0.07
ditch
-0.07
kết
-0.06
jsx
-0.06
اپ
-0.06
uchs
-0.06
příst
-0.06
эф
-0.06
�
-0.06
grandson
-0.06
POSITIVE LOGITS
pleted
0.07
flawed
0.07
Republicans
0.06
لي
0.06
aña
0.06
SID
0.06
FAT
0.06
vertis
0.06
consulted
0.06
.op
0.06
Activations Density 0.028%