INDEX
Explanations
This neuron detects placeholder instruction fragments—specifically the “[ insert … here ]”‐style tokens used to mark where a substitution should go.
New Auto-Interp
Negative Logits
appeared
-0.08
HOH
-0.07
antidad
-0.07
exus
-0.06
ethe
-0.06
LES
-0.06
Payne
-0.06
ばかり
-0.06
too
-0.06
ázev
-0.06
POSITIVE LOGITS
Viewer
0.07
trứng
0.06
quad
0.06
packet
0.06
,col
0.06
--> ↵
0.06
тесь
0.06
grammar
0.06
exclude
0.06
converged
0.06
Activations Density 0.012%