Computing with billions of cores

Supercomputer performance and projection. Source:

On November 11, 2010, the Chinese supercomputer Tianhe-1A earned the title of “fastest computer in the world,” when it was clocked performing operations at the rate of 2.57 petaflops.  Two of the top five computers are in China; two are in the USA; one is in Japan.  Two weeks ago, IBM announced plans to build a machine that will surpass them all: ten petaflops, with 750,000 cores.  IBM projects that by 2020, they will be able to deliver machines with millions of cores.

FLOPS stands for “floating point operations per second,” so one gigaflop is one billion floating point operations per second.  A teraflop is is one trillion flops, and a petaflop is one quadrillion (10^{15}) flops.  A floating point operation is an operation like addition or division of two numbers with fractional components.  The processors of PCs are usually measured differently: MIPS (millions of integer operations per second), or cycles per second, e.g. gigahertz.  The relation between a processor’s cycle speed and its flops can be complex, because it depends on pipelining of data and other factors, so some people consider flops to be a better measure of how fast a computer really is when it is trying to solve problems.

 The graphical processing units (GPUs) of desktop PCs are pretty frikkin fast, too.  The AMD Firestream 9270 claims to perform over one teraflop “in raw single precision performance.”  So a high-end home gaming system in 2011 can run about as fast as the world’s fastest computer in 1998.  Perhaps more significantly, my Droid Incredible smartphone houses a Qualcomm Snapdragon processor that runs at 1 Ghz, and Qualcomm plans to release a 2.5 Ghz quad-core chip for use in phones by 2012.  This is critical to our current discussion, because there are now more than five billion cell phones in the world — to say nothing of other mobile devices, like RFID chips — and projects distributed across many computers already run faster than Tianhe-1A.

As a researcher in biomolecular computation, I think about things like, “Suppose we had five hundred million cores, but they could only talk to one another slowly, via diffusion.  What problems might we solve efficiently that we cannot solve now?”  (Erik Winfree in this talk at ASPLOS 2008 made the point that a submarine has 10^6 parts and 10^{10} transistors, while a whale has 10^{17} cells and 10^{27} proteins.)  Only recently, though, I have come to realize that my question doesn’t just apply at a cellular scale: it applies to the very near future of macro-scale computing as well.

Experts in pervasive computing have understood this for years, I am sure, and I imagine I am unfashionably late to the party.  Still, I wonder how many computer scientists — students, teachers, or practitioners — understand this on a gut level.  According to Wikipedia, the distributed computing project Folding @ Home sustained 6.2 petaflops in April 2010, several months before the highest-performance computers in the world, um, couldn’t even break past three petaflops 😉 (A participant in Folding @ Home loads software onto his or her computer(s) that runs a program in the background (e.g., when the computer is idle) that determines a protein’s structure when it folds up.  There is no known fast algorithm for this — indeed, folding is an NP-complete problem — but the problem parallelizes very well.  Hand each user a different molecule, and everyone is good to go.)

In stark contrast to, say, the Turing machine, which was conceptualized many years before the first modern computer was actually built, current computer science theory appears to be struggling to articulate an understanding of this new reality, even as the high-performance engineers and managers of distributed projects are charging ahead.  In 2009, the Workshop on Theory and Many-Cores went so far as to say:

This low level of activity should be a concern to the theory community, for it is not clear, for example, what validity the theory of algorithms will have if the main model of computation supported by the vendors is allowed to evolve away from any studied by the theory.

This problem seems more like an institutional one than a mathematical one to me.  Current university faculty worldwide were educated with a one-processor world view, and it is not as though all interesting problems there have been solved.  So why not work on interesting problems that are already accessible, instead of trying to think a brand new way and risk that the new problems might be less interesting, or still inaccessible?  Still, the sheer computing power available within a few years by, for example, running an app in the background on every cell phone in China, is a prospect I find both intoxicating and otherworldly.  Whoever answers the question I cannot will literally predict the future.


3 responses to “Computing with billions of cores

  1. Oh boy! Congratulations on this wonderful weblog … I have long admired your thoughtful posts on other forums.

    Further data relating to the computational trends that your post notes can be found in Robert Bixby’s article “Solving Real-World Linear Programs: A Decade and More of Progress” (2002); see also slides that the author posts on-line. Bixby’s article (uniquely AFAIK?) considers progress in hardware and algorithms separately, and concludes that they are similarly important.

    Best wishes for happy blogging.

  2. I would not want to run a distributed program on my phone until my battery lasted so long that I never have to think about recharging it. This might happen if my cell phone can wirelessly recharge itself using ambient radio waves!?!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s