Most striking to me was how differently reporters assessed the accuracy of Stratfor’s intel, depending on geography. Apparently, Stratfor investigated PETA on behalf of Coca-Cola, and investigated Bhopal activists on behalf of Dow Chemical. While some might find this concerning, I didn’t hear any indication that the information obtained by those efforts was false. In contrast, two reporters from the Al Akhbar newspaper in Lebanon stated that much of the information gathered about the situation in Beiruit was false.

The Al Akhbar reporters said this situation was a particular problem, because the CIA was recently forced to shut down its intelligence operations in Lebanon. This increased US reliance on a private firm like Stratfor. Apparently, though, Stratfor, to maximize profits, provided a lot of intel on Lebanon by using Google Translate to read open source material written in Arabic, literally losing the meaning in translation, instead of hiring analysts fluent in the language. Further, their evaluation of sources was, according to one reporter, “racist” in the sense that if an ideologically extreme Arab made a statement and an ideologically extreme Israeli made a different statement, Stratfor analysts would discount the Arab and take the Israeli seriously.

I’ve read only a few of the emails myself, and I can’t speak to the accuracy of any claim. However, it does seem clear that the notion of Stratfor just being a service that reads and analyzes open-source material is incorrect. Unless the released emails are heavily fabricated, Stratfor initiated intelligence gathering operations on the ground, bribed confidential informants around the world, and encouraged their employees to control sources by “psychological” or “sexual” means.

Finally, no matter your personal political persuasion, Stratfor’s internal glossary of intelligence terms is *hilarious*. I will close with some definitions from it.

Backgrounder: General analysis that gives the customer better situational awareness. The customer never actually reads the Backgrounder. Its primary use is as cover when the customer screws something up. Backgrounders are the basic intelligence tool for shifting blame to the customer.

or

He Won the Cold War: Egomaniacal Bullshitter

and

He Won the Vietnam War: Deranged Egomaniacal Bulshitter

and, in conclusion, a definition made more intriguing by (and perhaps at odds with) the claims of the Al Akhbar reporters:

Duplicitous Little Bastards: Israeli intelligence

Tagged: security, social history of computing ]]>

My account is @aaron_sterling, and you can see it in the rightmost column of this blog. Here are three items that are good examples of things I found interesting, but which, after today, I won’t be “elevating” to the status of a blog entry.

- The computer security company McAfee has produced a document titled 2012 Threat Predictions (pdf file). I skipped over some of it, but the parts I read were fascinating. For example, they see BitCoin as an extremely insecure currency, they believe illegal spam will diminish and be replaced by “legal spam” (equally annoying), and they think far more attackers will target hardware exploits instead of the traditional software exploits. Worth a look.
- Enrique Zabala has produced a Flash animation that explains Rijndael/AES visually. It is beautiful.
- Rajarshi Guha and co-authors are designing a type-ahead chemical substructure search engine. This addresses a longstanding open problem in cheminformatics, which is: searching for chemicals in a database is slow (in worst case probably exponential because the Subgraph Isomorphism Problem is NP-complete), but can it be made faster? At least for important special cases, this tool seems to be competitive in speed with Google’s type-ahead search engine for other content: it provides the chemist suggestions, given the prefix of the input available, before the chemist even hits the enter key.

Tagged: cheminformatics, chemoinformatics, cryptography, security ]]>

The hacks that do the most damage don’t have Twitter feeds.

Another security expert, Jeremy Falkenrath, in an interview on Bloomberg News (at about 7:00 into the video), discussed, quite matter-of-factly, the hacker-for-hire market that companies in the chemical industry deploy against one another to learn trade secrets. With this as the backdrop, I’d like to discuss one of the main open questions of cheminformatics: *Is secure encryption of molecules possible? *For example, it would be nice if a company could encrypt a molecule, but then allow some third party to run *in silico* tests with it, having access to the molecule’s properties but not the structure itself.

**Encryption of molecules**

Part of the reason for the traditional closed-data policies of pharmaceutical companies is the total absence of any way to encrypt chemical structural information. This has been recognized as an open problem for many years, the American Chemical Society held a special meeting in 2005 about it, a summary of which appeared in Nature. While there were presenters at that meeting who felt molecular encryption was possible, and others who felt it was impossible, the practical reality as we enter 2012 is that, so far, the voices in favor of “impossible” have been correct. Almost no new theoretical literature has been produced since 2005, and the industry appears no nearer a practical solution than it was in, say, 1975.

I recently had an idea to expand upon a proposal by Eggers et al. in 2001, to watermark *in silico* representations of molecules. My idea, however, is going nowhere — just like all other attempts so far to implement chemical watermarking. At least I can get a blog entry out of my failure though! I hope readers of this page find my little attempt entertaining or informative.

**Acknowledgement**: The material in this post is based on conversations I have had with cheminformaticians Rajarshi Guha and Jörg Kurt Wegner.

**Watermarking molecules**

A digital watermark, of course, is a special code embedded into a file so that its provenance can be determined. The file has to be large enough so that perturbations to individual data points are not feasibly discoverable to a would-be forger, even though there are enough data points to produce a unique fingerprint for that file, at least with high probability. In 2001, Eggers et al. published Digital Watermarking of Chemical Structure Sets, which proposed the use of SCS watermarking on databases of molecules, so that the company that had created or owned the molecules would be able to prove that another company’s claim to independent discovery was, in fact, infringement.

A single molecule is not large enough to embed an indiscernible watermark, which is why Eggers et al. considered “structure sets” — a large list of molecules that are somehow related. Technically, each molecule is considered as a graph embedded into three-dimensional space. (In fact, this graph may be the “reduced graph,” or a graph of the molecule with all hydrogen atoms removed, because reduced graphs performed better on one empirical test in their study.) I will ignore a lot of technical details here, but the core idea behind their watermarking process is to perturb the 3D coordinate of specially-chosen atoms by a small, “random-appearing” amount. If enough locations of atoms in the structure set are modified this way, it produces a digital watermark.

More interesting to me than the watermarking method was the list of potential attacks Eggers et al. considered. If the attacker can do *anything*, it is famously difficult to encrypt data while still providing functionality. However, we are working in the chemistry domain. On the one hand, this adds technical obstacles (due to “real-world messiness”), but, on the other hand, it may limit the attacks that can be performed on the system, since the attackers need to obtain a structure that is chemically useful, not just some fragment of plaintext. Eggers et al. tried to provide a watermarking method resilient to the following attacks.

- Removal of data from the original dataset.
- Injection of non-watermarked structures into the dataset.
- Reordering of individual records in the dataset.
- Reordering of atoms and bonds in the structure records.
- Global 3D transforms, e.g., rotations or translations.
- Changes of structural notation conventions.
- Removal of hydrogen atoms from structures.

**My new idea and why it fails**

I will cut to the chase and explain why my idea doesn’t work — and why, perhaps, no watermarking concept is feasible in the chemical domain. Then I will conclude, anticlimactically, by describing my idea, in case someone can get something out of it.

Consider the domain of music piracy. The attacker can hear the music, and perhaps even has access to guitar tabs or a symphony score. However, it would not be cost-benefit for the attacker to hire a brand-new orchestra to re-record the original piece of music — and, even if the attacker did that, the new track would not sound exactly the same. By contrast, in chemistry, every carbon atom is identical to every other carbon atom. So in the case of a chemikcal structure set, the attacker could choose a molecule(s) of interest, and then just rebuild the structure from scratch, and use the 3D coordinates (and all other information) from the brand new molecule. Presto, no watermark.

Eggers et al. consider this attack, sort of. They say:

We assume that no unlicensed copies of the software used to generate the original protected dataset are provided in circulation. Furthermore, the computation time for large datasets in often significant. Depending on the type of algorithm used and the size of the dataset, it can be up to several CPU months. Thus, simple regeneration of the data is often not a feasible approach.

While the claim about significant recomputation time may have been true in 2001, it no longer appears to be. As one chemist said to me, recomputation of a molecular structure “wouldn’t be hard at all.” This leaves the only defense of a watermark the unavailability of licensed software to perform the recomputation. That, however, relies on an attacker who is willing to perform industrial espionage but who draws the line at software piracy. So, no go.

Now we get to my idea. In the field of image processing there has been a lot of work since 2001 on 3D watermarking. One survey is 3D Watermarking: Techniques and Directions, by Koz et al., which considers strengths and weaknesses of many different watermarking techniques. One method that caught my eye was Robust Image Watermarking Based on Generalized Radon Transformations, by Simitopolous et al., because it is specifically designed to be resistant to “geometric attacks” like rotation, scaling, translaton and cropping, which closely follow the list of anticipated attacks in the Eggers et al. paper. Applying this to chemistry might provide a much better watermarking tool than the more generic watermarking method chosen by Eggers et al.

However, I’ve decided not to consider this further, because it seems that the most I could get out of it would be a paper that would be theoretically “cute,” with no chance of being practical, ever. I would love to be proven wrong, so if you can think of a way to get around the “rebuild the molecule from scratch” attack, please do comment. In the meantime, I am on to other things.

Joachim J. Eggers, W.D. Ihlenfeldt, & Bernd Girod (2001). Digital Watermarking of Chemical Structure Sets Information Hiding, 200-214 DOI: 10.1007/3-540-45496-9_15

Tagged: cheminformatics, chemoinformatics, cryptography ]]>

**Introduction**

Ian Stewart is one of the premier popularizers of mathematics. He has written over twenty books about math for lay audiences. He has also co-authored science fiction, and books on the science of science fiction (three books on “the science of discworld”). In his newest effort, *The Mathematics of Life*, Stewart focuses his talents on the mathematics of biology, and the result is superb. In an easy, flowing read, with dozens of diagrams and scholarly footnotes — but without a single formula — he introduces the reader to a wide range of interactions between mathematicians and biologists. I heartily recommend this book.

**Turing’s morphogenesis in the modern day
**

*The Mathematics of Life* contains 19 chapters. Chapter 8, “The Book of Life,” focuses on the Human Genome Project, and algorithmic challenges of DNA sequencing. However, as this possibly the area most familiar to SIGACT News readers, I will only mention it briefly, and, instead, focus on chapters that introduced me to areas of mathematical biology I had not previously encountered.

Perhaps the most direct connection to (the roots of) theoretical computer science comes in Chapter 13, “Spots and Stripes,” where Stewart considers Alan Turing’s famous paper, The Chemical Basis of Morphogenesis, and sketches the development of biological thought about animal markings since Turing’s groundbreaking proposal. As Stewart says:

For half a century, mathematical biologists have built on Turing’s ideas. His specific model, and the biological theory of pattern-formation that motivated it, turns out to be too simple to explain many details of animal markings, but it captures many important features in a simple context, and points the way to models that are biologically realistic.

Turing proposed “reaction-diffusion” equations to model the creation of patterns on animals during embryonic development. As noted by Stewart, Hans Meinhardt, in The Algorithmic Beauty of Seashells, has shown that the patterns on many seashells match the predictions of variations of Turing’s equations. The mathematician James Murray extended Turing’s ideas with wave systems, and proved the following theorem: *a spotted animal can have a striped tail, but a striped animal cannot have a spotted tail*. Intuitively, this is because “the smaller diameter of the tail leaves less room for stripes to become unstable, whereas this instability is more likely on the larger-diameter body.”

**Teaser for full review**

In the pdf I also discuss Stewart’s presentation of evolutionary game theory to model biological evolution, and the use of high-dimensional geometry to model the self-assembly of chemical virus coats. Beyond that, there is much more great material in Stewart’s book that I did not mention at all. The prose style is friendly and clear throughout, without talking down to the reader.

I consider this to be an excellent introduction to the mathematics of biology, for both amateurs and professionals. Seasoned researchers are likely to learn “teasers” about areas unfamiliar to them, and smart people “afraid of math” can read the book and enjoy the material. Highly recommended. I will conclude this review with the same words Stewart used to conclude the book:

Instead of isolated clusters of scientists, obsessed with their own narrow specialty, today’s scientific frontiers increasingly require teams of people with diverse, complementary interests. Science is changing from a collection of villages to a worldwide community. And if the story of mathematical biology shows anything, it is that interconnected communities can achieve things that are impossible for their individual members.

Welcome to the global ecosystem of tomorrow’s science.

Ian Stewart (2011). The Mathematics of Life Book: ISBN 0465022383

Tagged: book review, general audience ]]>

I’ve deleted my credit card information from Zappos, and from one other online retailer I use. To be honest, I’m not sure who else might have my sensitive information — and I bet I’m not alone in that. I’m not sure what precautions I will take in the future when shopping online, but I plan never to save my credit card information again.

Stay safe, everyone.

]]>

Stratfor, by the way, finally has their website back online, with a Hacking News section, in which they tell their side of the story. (They verify that they stored credit card information in cleartext, as Anonymous had claimed, and they state that they were working with the FBI on an investigation into a hack of their systems before the hack went public on Christmas Eve.) About a week ago, the hackers released a zine which includes a press release about the Stratfor hack and two others, and a log of the hacks themselves.

**Passwords have always been weak**

Zviran and Haga published Password Security: An Empirical Study in 1999. They analyzed passwords used at a Department of Defense facility in California. They discovered that the vast majority of passwords in use were extremely insecure. 80% of the passwords were 4-7 characters in length, 80% used alphabetic characters only, and 80% of the users had never changed their password. Fast-forwarding to the present, passwords of Stratfor subscribers are not much better. The most common password is “stratfor” and there are single-character passwords.

Steve Ragan was one of the first to publish an analysis of the Stratfor passwords. His conclusion:

We’re sorry to report that the state of password management and creation is still living in the Dark Ages.

Ragan cracked almost 82,000 of the hashed passwords in under five hours, using one desktop computer and Hashcat, an off-the-shelf password cracking tool. He provides lists of the most common passwords, and his own “favorites” from the database, including the password ***** (five asterisks).

The most comprehensive analysis of the Stratfor passwords I have seen is by Gerrit Padgham, who cracked 86% of the unique hashes using GPU technology. Padgham writes:

Probably the surprising and under-reported insight we found is that a majority (about 630K) of the passwords we recovered appear to be randomly generated by the Stratfor site at registration time. These passwords all have a very specific set of characteristics. They are eight characters long. They consist of uppercase and lowercase letters, and digits (‘mixedalphanum’). With a mid-range dual GPU machine, we were able to test and recover all passwords for that entire character set and length in just over 24 hours.

So what does this tell us? It’s likely that during enrollment, the system generates a password automatically, and e-mails it to the user. Normally users are required to change the randomly generated password on subsequent logins.

So it seems that well over three-quarters of all the breached accounts were created but never used.It’s possible that Stratfor auto-generated a password and didn’t require a change on next login, but based on our discussions with users exposed by the breach it appears that this was not the case.

**Better security protocols going forward**

There have been a lot of posts on security blogs about lessons to learn from the Stratfor hack. One, by Ben Tomhave, lists the errors in Stratfor’s (lack of) security. Tomhave was a Stratfor customer, and his information was compromised. He is clearly very angry at Stratfor, and hopes they go out of business as a result of their incompetence. (That scenario seems unlikely to me.) However, tone aside, Tomhave makes strong points.

More constructively, Nick Selby calls on security professionals not to blame the victims who had their information compromised. Selby’s point is that it is incorrect to say that someone should just make a stronger password next time, because “the passwords are the problem, yo.” His suggestion:

Rather than continue to make the users do the stupid and useless things we as security professionals tell them to do, let’s remove them from the equation. First, some basic common sense in building web applications would be nice, as would testing regularly with competent people doing the testing. Don’t let this be the end: secure stuff properly on your endto protect your users. Stop being such a cheapskate and spend some money on your security people. Test your assumptions regularly by having competent people test them. Follow the instructions of what these people say – don’t just sweep them under the rug or plan it for the 2016 fifth quarter budget cycle.

Regarding passwords suckage, Hey! Allowing passphrases would be nice – I don’t know how

muchmore secure a passphrase such asOoh, yo – this is my secret passphrase!

is than something random and stupid like

bRo$NdoG726.

but I can tell you that it is a lot easier to remember, doesn’t force your users to fill their desk with Post-It reminders and, oh yeah, is harder to crack.

I will conclude with a link to an xkcd comic that says the same thing visually that Selby said in his blog: passphrases are strong and easy to remember; passwords weak and forgettable.

Moshe Zviran, & William J. Haga (1999). Password Security: An Empirical Study Journal of Management Information Systems, 15 (4), 161-185

Tagged: security, social history of computing ]]>

- Derrick Stolee requested in a comment a resolution of the computational complexity of the 3D version of the problem of decomposing a shape into the minimum number of rectangles. I found a reference that proves the problem is NP-complete, by directly reducing the problem to a variant of 3SAT. The diagrams of the gadgets used are pretty cool — the gadgets look like children’s toys used to build 3D structures. Rectangular partition is polynomial in two dimensions but NP-complete in three, by Victor J. Dielissen and Anne Kaldewaij, Information Processing Letters, April 1991.
- The survey Polygon Decomposition by J. Mark Keil (1996) has much more information on exact algorithms for rectangulation, triangulation, and problems I did not mention at all, like covering.
- There is an extensive literature on approximation algorithms for finding a minimum-length rectangulation of an orthogonal polygon with holes. (The problem is NP-complete even for the case where the polygon is a rectangle and its interior holes are points.) I can recommend the survey Minimum Edge-Length Rectangular Partitions, by Gonzalez and Zheng (in
*Handbook of Approximation Algorithms and Metaheuristics*, 2007).

Victor J. Dielissen, & Anne Kaldewaij (1991). Rectangular partition is polynomial in two dimensions but NP-complete in three Information Processing Letters, 38 (1), 1-6 : 10.1016/0020-0190(91)90207-X

Tagged: computational geometry, rectangular partitioning ]]>

Stratfor is a private intelligence-gathering firm whose principals have close ties to the US intelligence community. Stratfor has been called the “shadow CIA.” Anonymous claims to have obtained 200 GB of data, including 2.7 million private emails and 4000 credit cards. While big media worldwide have focused so far on the “Operation Robin Hood” nature of the attack — the hackers claim to have made $1 million in donations to charities using the credit card information — one Anonymous member has stated that the real reason for the attack was to obtain the emails, and the hackers did not expect the credit card information would be as easy to obtain as it was.

Perhaps the most interesting writing I have seen on this subject is at the site databreaches.net, which provides a timeline of the hack, and suggests that it had been going on for a week or more, without Stratfor’s knowledge. Databreaches.net also asks the reasonable question whether Stratfor might be legally liable for the compromise of credit card data, because it appears that both Texas law (where Stratfor is based) and Stratfor’s own privacy policy prohibit the storage of credit card information in cleartext. Moreover, Stratfor apparently stored the 3-digit security codes of credit cards in cleartext also, and standard security procedure is not to store those codes at all.

This situation reminded me of a comment Peter Taylor made on an answer of Peter Shor on CSTheory. Shor was answering a question about what would happen if it turned out that factoring could be solved in polynomial time. Among other things, he said, “`as soon as it was known that factoring was in P, the banks would switch to some other system.” Taylor responded:`

A bit off-topic, but

`as soon as it was known that factoring was in P, the banks would switch to some other system`

is largely wishful thinking. I discovered in December that a company which doesn’t do anything except process credit card details was using a variant of Vigenère with a key shorter than some runs of known plaintext. Worse, the technical director of the company wouldn’t believe me that it was insecure until I sent him some attack code. MD5, despite being widely considered broken, is still used heavily in banking.

For as long as I have been reading computer science theory blogs, commenters have left a lot of critical comments, along the lines of, “The result you are getting excited about is a very small advance, and has nothing to do with the real needs of industry.” At a political level, similar arguments are used to reduce funding to theoretical research of all kinds, including theoretical CS. I believe these arguments are completely incorrect, because the much more pressing problem is that *industry doesn’t use fully-implementable techniques that theorists discovered years ago*. In the cases of HBGary and Stratfor, this may well have been because the principals considered themselves “too important” to take mundane steps, but there is no doubt that data insecurity, extremely suboptimal algorithm design, etc., is rampant in the business sector. An industry, and a government, that dismisses the importance of theory, will pay heavy prices in the long run.

**Postscripts**

- Jonathan Katz recently blogged about an upcoming workshop: “Is Cryptographic Theory Practically Relevant?”
- There is a short CSTheory community wiki on the difference between the theory and practice of security and cryptography.
- Databreaches.net reports that there is a series of hacks taking place in China right now, perhaps to protest a move to require the use of real names on the internet. Over 40 million users have had their information compromised. I hope everyone reading this blog stays safe, as we enter 2012.

Tagged: security ]]>

This will be my last post until the new year. As a holiday gift, please allow me to share with you the “Technical Papers Trailer” for SIGGRAPH Asia 2011. The conference itself just ended, but the video is a great example, to my mind, of how to popularize computer science.

Tagged: SIGGRAPH Asia, video ]]>

The minimum-length rectangulation algorithm appeared in Minimum Edge Length Partitioning of Rectilinear Polygons, by Lingas, Pinter, Rivest and Shamir (1982). The authors proved both a positive and a negative result. The positive result — which I will focus on today — is a dynamic programming algorithm that finds an optimal minimum-length rectangulation for any orthogonal polygon *with no interior holes*. The negative result is a proof that, if the input polygon is allowed to have holes, then the problem is NP-complete. (I discussed the proof of this result in a previous blog post.)

**Optimal polygon triangulation**

The dynamic programming algorithm for polygon rectangulation is similar in structure to a well-known one for polygon triangulation — finding a minimum-length set of nonoverlapping triangles that partitions the input polygon . One online source for the triangulation algorithm is these lecture notes by David Eppstein. As far as we know, rectangulation is more complicated to solve than triangulation. (Among other things, minimum-length triangulation runs in , while the Lingas et al. algorithm runs in .) In the words of Lingas et al., “The difficult proof for the rectangular case involves showing how a given polygon can be split in a small number of ways while guaranteeing that the optimal partitioning is consistent with one of these splits.”

In brief, the triangulation problem is “simple,” because finding a triangulation of any input polygon reduces to finding a triangulation of a regular polygon. (Eppstein’s lecture notes explain why.) Since we can limit ourselves to input polygons with such nice features, the algorithm analysis is fairly simple. In the case of rectangulation, there is no known reduction to a simple class of input polygons. Instead, Lingas et al. show that only certain kinds of subfigures need to be considered, and that an optimal rectangulation of the entire polygon can be built up from optimal rectangulations of those subfigures. This makes the problem solvable via dynamic programming. We turn now to the method of rectangulation itself.

**Proof of a key lemma**

A key lemma in the MELPRP paper is:

Lemma 1. In all minimum edge length solutions for any given polygon , all edges and all internal corners lie on the vertex grid.

As defined in the previous blog posts in this series, the vertex grid is the set of all points that lie on both vertical and horizontal lines that pass through a vertex of the input polygon. This is a very useful fact, since it limits the number of possible subrectangulations we need to evaluate. However, Lingas et al. assert it without proof. (Their 11-page paper appeared in a conference, and, to my knowledge, no journal version of the paper has ever appeared.) Perhaps this lemma is “obvious,” or folklore, but Ming-Yang Kao and I needed (a “slightly stronger” but still equivalent statement) for work we were doing, so we reproved lemma 1. The proof is short enough that I will reproduce it here.

Lemma. Let be a polygon with all vertices lying on integer coordinates, and let be a (finite) rectangulation of . There is an algorithm that runs in time polynomial in the number of cuts of , that produces a rectangulation of $P$ such that (1) all points of lie on integer points, and (2) the total length of is at most the total length of .

*Proof*. If all vertical cuts have integer x-coordinates, and all horizontal cuts have integer y-coordinates, then the rectangulation is an integer rectangulation, and we are done. Call the cuts that fail to have those properties *bad cuts*.

*Claim*: For each horizontal bad cut, we can translate it north or south, so that it will either encounter another horizontal bad cut, or reach an integer y-coordinate, without increasing the total perimeter of the rectangulation.

*Proof of Claim*: Let be the horizontal bad cut we wish to translate. Let be all vertical cuts incident with from the south. Let be all vertical cuts incident with from the north. Suppose WLOG that . We translate north, extending each so it remains incident with , and shortening each so it remains incident with as well. (If then we translate south, performing the same lengthening and shortening operations, but switching the roles of the and the .) Since we are shortening the by the same length as we are lengthening the , we do not increase the total perimeter of the rectangulation. We continue translating to the north until we encounter either another horizontal cut, or until the -coordinate of is an integer.

We run the following algorithm with input . Initialize . Order the horizontal bad cuts, and order the vertical bad cuts. Go through the horizontal bad cuts one at a time. Let be the horizontal bad cut under consideration. Translate it, without increasing the perimeter of , until it has an integer y-coordinate, or until it encounters another horizontal cut. If encounters another horizontal cut, call it , then remove both and from (and from the ordering of horizontal cuts of ), and place a new cut into , where the west endpoint of is (min[x -coordinate of , x-coordinate of ], y -coordinate of )$, and the east endpoint of is max[x-coordinate of , x-coordinate of ], y -coordinate of ). (We name this process *absorption*, and say that and are *absorbed* into .) If is at an integer y-coordinate, then we are done with this stage of the loop and we go on to the next horizontal cut in the order. Otherwise, it may be that we need to translate either in the same, or the opposite direction as we were translating previously. We continue translating in a non-lengthening direction, and absorbing if necessary, until we translate the horizontal cut at an integer y-coordinate. (There are at most number-of-cuts-many absorptions, so this process terminates efficiently.) Then we continue with the next horizontal cut in the ordering, until all are considered or removed from the ordering.

Once the bad horizontal cuts are all dealt with as above, we perform the same operations on the ordering of bad vertical cuts, except that we translate east-and-west instead of north-and-south. By the same arguments as above, in time polynomial in the number of vertical cuts, we place and/or absorb all vertical cuts onto integer x-coordinates without increasing the length of the rectangulation. Once all cuts have been considered, we have an integer rectangulation such that .

**Set of subfigures**

Given the input polygon , we will decompose into subfigures, find the minimum-length rectangulation of each subfigure, then build from that a minimum-length rectangulation of itself. Naively, we might expect that we have to consider every subfigure whose vertices lie on the vertex grid of , and whose boundary is contained within the boundary of . With a bit more analysis, it is possible to significantly reduce the set of subfigures that we need to consider. In the words of Lingas et al, let us call a *constructed line* “a maximal extension of a partitioning edge to include any sides of the boundary that are contiguous to the edge and go in the same direction. Two or more aligned edges may be put on the same constructed line, but this is not always the case.” Lingas et al. provide the following diagram as an example of constructed lines.

We are now able to state the key fact used to limit the size of the set of subfigures that must be considered by the rectangulation algorithm.

Fact. It suffices to consider subfigures whose boundary consists of a contiguous piece of the original boundary and at most two constructed lines (and the constructed lines are contiguous).

**The partitioning rule**

Fix a subfigure that satisfies the previous Fact. We assume for the moment that we have already considered all (Fact-satisfying) subfigures that are contained inside . We choose a *candidate point* for by the following method.

- If has 0 constructed lines, choose any vertex of .
- If has 1 constructed line, choose either endpoint of the constructed line.
- If has 2 constructed lines, choose the point where the constructed lines meet.

Once we have chosen the candidate point , we consider each point on the vertex grid to be a possible *matching point*. A matching point is one that defines a rectangle when paired with — i.e., they are the opposite corners of a rectangle. There are at most possible matching points inside any subfigure, and we only need to consider the points such that the induced rectangle lies entirely within . (There are additional optimizations available also. For example, if is a concave vertex, then its matching point must have the same x- or y-coordinate at . I will leave this optimizaton to one side, though.)

The process so far has produced the following: a set of subfigures; a set of candidate points, one for each subfigure; and a set of matching points for each subfigure, at most for each subfigure. We will now combine these ingredients into a dynamic programming algorithm for minimum-length rectangulation.

**Dynamic programming algorithm**

Given input polygon , do the following

- Produce all subfigures that satisfy the Fact above, and order them by area, from smallest to largest.
- Find the candidate point for each figure.
- Loop through the figures, in order.
- When considering figure , loop through the set of matching points .
- Calculate the length of the rectangulation of obtained by drawing a rectangle defined by the candidate point and the matching point . This can be done quickly because previously in the order of figures we computed the minimum length of the figure , which is minus the rectangle .
- Find the minimum over all rectangulations obtained in Step A(i). Store that as the minimum-length rectangulation for figure .
- Halt when we have computed the value of the maxmimum figure, that is, the minimum-length rectangulation of .

Since there figures that satisfy the Fact, and, for each figure, there are matching points, the total time of the algorithm is .

Andrzej Lingas, Ron Y. Pinter, Ronald R. Rivest, & Adi Shamir (1982). Minimum edge length partitioning of rectilinear polygons Proceedings of the 20th Allerton Conference on Communication, 53-63

Tagged: computational geometry, orthogonal polygon, rectangular partitioning, rectilinear polygon ]]>