(**UPDATE**: My statements in this blog entry about the Göös Orponen paper were based on an earlier draft, which was on an author’s web page, but was not the final, published version. I have updated the link to their paper, and struck out my statements that are not consistent with their final draft. In particular, the final version of their paper reads, “Ma & Lombardi show a certain derivative of the above optimization problem NP-hard in [7]. However, to our knowledge, a proof of the NP-hardness of the PATS problem as stated above is lacking.” Please see the comment of Mika Göös below for more information.)

One of my research interests is the DNA self-assembly of multicolored shapes. Almost all DNA self-assembly literature — both theory and practice — has focused on the assembly of either 1-color or 2-color shapes. Nevertheless, if we think of the DNA assembly as a substrate, or scaffolding, upon which transistors or nanomachines are placed later, different colors can correspond to OR gates or NAND gates, or the “landing pads” of much more sophisticated objects. A while back, I did a literature search on the self-assembly of multicolored shapes, and I thought I “knew everything” that was currently in the literature. However, I was wrong. I recently discovered papers in the literature track of Computer Assisted Design of integrated circuits (a track I had never followed before, and clearly I will have to start), whose objective is to tackle DNA self-assembly of multicolored rectangles, an obviously related special case of “my” problem. I will discuss two of these papers in this blog entry.

The two papers, both by Xiaojun Ma and Fabrizio Lombardi of Northeastern University, are On the Computational Complexity of Tile Set Synthesis for DNA Self-Assembly, and Combinatorial Optimization Problem in Designing DNA Self-Assembly Tilesets. The second is much shorter, reads like an announcement of results, and appeared in a workshop. The first appeared in a journal and contains a full proof of an NP-completeness result. I had initially intended this blog entry to be a technical discussion of the NP-completeness result, which I find clever, and I think it defines a graph model of tilesets that might be useful in other settings. However, when I looked into the literature, I found things were a bit of a mess. ~~A subsequent paper by other authors incorrectly states that Ma and Lombardi proved something stronger than they actually did~~, and this error is understandable, because Ma and Lombardi changed the informal description of their theorem in between the workshop paper and the journal paper. (The workshop paper makes a stronger claim, which the journal proof does not support.) So the rest of this entry reads more like a detective story — or a he-said she-said dialogue (!) — than an exposition. It took me a couple days to separate claims from proofs, and I hope that, by writing this up, I will save other researchers some time.

First, let’s state the problem about multicolored rectangles we would ideally like to solve.

Input: A set of unit squares in the shape of a rectangle; each square is colored with one of different colors.

Output: A minimum set of DNA tiles whose unique terminal assembly is the input rectangle.

I won’t go into exactly how DNA tiles are defined, either abstractly or in the lab; nor will I define self-assembly formally. The theoretical framework for all papers I consider in this blog entry is the Abstract Tile Assembly Model, due to Winfree and Rothemund, which is explained at the beginning of these PowerPoint slides by David Doty. My focus will be on the relationships between different algorithmic problems — which are claimed, and which are actually proved.

~~For example, Göös and Orponen, in Synthesizing Minimal Tilesets for Patterned DNA Self-Assembly, state that Ma and Lombardi prove the following problem to be NP-complete~~:

Input: A set of unit squares in the shape of a rectangle; each square is colored with one of different colors.

Output: A minimum set of DNA tiles that builds rectilinearly, and whose unique terminal assembly is the input rectangle.

I have underlined the change from the previous problem. Intuitively, a rectilinear tileset can only grow in one direction, e.g., northeast: the first tile is placed at , and each subsequent tile must be placed in the first quadrant — either due north or due east of a tile that has already been placed. This is, of course, a significant limitation on the space of all possible tilesets that solve our original problem, but the new problem is still nontrivial, and the experimental reality is that it is much easier to control rectilinear growth in solution than to control other behaviors.

~~Göös and Orponen cite the Ma Lombardi journal article in support of this statement~~. While it is true that Ma and Lombardi made essentially this claim in their workshop paper, in the journal article they state something different, which is that the following problem is NP-complete.

Input: A tileset that rectilinearly assembles a multicolored rectangle by placing a unique tile on each location of the rectangle.

Output: A minimum tileset that rectilinearly assembles that rectangle, obtained from the input tileset by a process of “tile synthesis.”

The difference is that, instead of saying the problem is NP-complete with respect to all possible algorithms, Ma and Lombardi limit their approach to the process of “tile synthesis,” performed on an input trivial tileset. Briefly, Ma and Lombardi define a *tile graph* and a *bond graph* from the input tileset. The bond graph is a bipartite graph, and the edges are colored with the colors of the original tiles that possess the corresponding bonds. The process of tile synthesis is the merging of vertices on one side of the bipartite graph so that, when edges are merged, there is no color conflict in the new merged edge. (Two black edges can make a single black edge, but a black edge cannot merge with a white edge.)

Furthermore, the proof itself, which is a reduction to and from the Minimum Graph Coloring Problem, actually shows the following problem to be NP-complete.

Input: A tile graph and bond graph.

Output: The minimum-vertex tile graph and bond graph obtainable by tile synthesis (i.e., vertex merging).

Now I may well be a major doofus here — it has happened before — but I don’t see how to get from the NP-completeness of this problem to the NP-completeness of the previous problem. The input set of the previous problem (tile graph and bond graph obtained from trivial tileset that rectilinearly self-assembles a multicolored rectangle) is strictly contained in the input set to this problem (all tile and bond graphs). So, in the final analysis, it seems to me that for every single problem, except this very last one, the computational complexity of the problem *is still open*.

I will have to think about this some more.

X. Ma, & F. Lombardi (2008). Combinatorial Optimization Problems in Designing DNA Self-Assembly Tile Sets IEEE International Workshop on Design and Test of Nano Devices, Circuits and Systems, 73-76 DOI: 10.1109/NDCS.2008.7

Ma, X., & Lombardi, F. (2009). On the Computational Complexity of Tile Set Synthesis for DNA Self-Assembly IEEE Transactions on Circuits and Systems II: Express Briefs, 56 (1), 31-35 DOI: 10.1109/TCSII.2008.2010161

Mika Göös, & Pekka Orponen (2011). Synthesizing Minimal Tile Sets for Patterned DNA Self-Assembly DNA Computing and Molecular Programming, LNCS, 6518, 71-82 DOI: 10.1007/978-3-642-18305-8_7

You have hit upon the same point of confusion about which I had a series of e-mails with Ma and Lombardi (without really reaching a satisfying conclusion). In short, the problem I see with the claim of Ma and Lombardi is that arbitrary “tile graphs + bond graph” representations don’t seem have corresponding multicoloured patterns, i.e., Ma and Lombardi only give you a way of converting a multicoloured pattern into a “tile graph + bond graph” representation but not the other way around (and this is required for the NP-hardness proof).

Unfortunately, our paper that you *link to* is an *old* draft of our article (I have to get this fixed). An up-to-date version can be found following the DOI reference you give or from arXiv: http://arxiv.org/abs/0911.2924. There, we note that the rectilinear assembly problem as we state it has not been proved NP-hard.

To add to the confusion, a recent paper by Demaine et al. on “One-Dimensional Staged Self-Assembly” also states “the problem of finding a minimum-size tile set has been shown to be NP-hard” referring to both our article and the article of Ma and Lombardi.

Thanks very much for your comment, Mika Göös. I’ve updated the link to your paper, and corrected the blog entry.

I am one of the authors of the Demaine et al. paper referenced by Mika Göös.

You are correct that we reference the Ma and Lombardi paper as proving that the problem is NP-hardness. But it is in error, and occurred (I think) because of the same confusion as this blog post explores.

Maybe I can shed some light on where the difficulty (for me) in determining what Ma and Lombardi have shown occurs.

Quotes below are taken from:

X. Ma and F. Lombardi, Synthesis of tile sets for DNA self-assembly, IEEE. Trans. on Computer-Aided Design of Int. Circuits and Sys. 27(5), 2008.

Page 1:

“Definition 1: If through self-assembly a tile set T can construct a pattern P of finite size, then T is said to be an assembling tile set of P .

Definition 2: An optimum tile set for a pattern of finite size is an assembling tile set of minimum cardinality (i.e., with the smallest possible number of tile types) for that pattern.

Definition 3: The PATS problem finds the optimum tile set for a finite size pattern.”

Page 2:

“Theorem 2: The PATS problem is equivalent to the coloring problem, and therefore, it is NP-complete.”

Between these two quotes, there are further additions to the definition of the PATS problem:

“Only the synthesis of the rule tiles is considered in this paper, and self-assembly takes place from south–east to north–west.”

and

“Starting from a trivial tile set of P , valid tile sets for P with reduced number of tile types can be generated by merging the bond types.”

A similar thing happens in the longer version of their paper:

X. Ma and F. Lombardi, Combinatorial optimization problem in designing DNA self-assembly tile sets, IEEE. Intl. Workshop on Design and Test of Nano Devices, 2008.

There are definitions of `assembling tile set’ and `PATS problem’, that when taken alone are equivalent to finding the smallest aTAM at tau = 2 that assembles a given multicolored rectangular pattern. However, there are additional restrictions outside of these definitions that seem to modify them, and thus restrict the problem to the last one defined above by Aaron in his post.

I apologize for the long post and our paper adding to the confusion, but I hope my response here helps clarify.

-Andrew

Pingback: DNA self-assembly of multicolored rectangles (via Nanoexplanations) | First Praxis

I am also interested in this problem and its variants. I read the shorter paper and wrote emails to the authors with similar questions but they never responded. For that reason I never bothered reading the longer paper, but I guess that would have clarified for me that the shorter paper was mistaken in its main claim.

So I agree with Aaron that the NP-completeness of all variants of this patterning problem that ask for a minimum tile set (without restricting to some class of algorithms producing the tile set) should be considered open problems. We asked similar open questions here (http://www.dna.caltech.edu/~ddoty/papers/pnsa.pdf questions 6,7,8) and I would be very excited by a solution to any of these problems.

Pingback: Weekly Picks « Mathblogging.org — the Blog