2010 November » Tim Berglund

You have 1,000 Cores. Go.

Intel is talking about a radically scalable processor design that can do 48 cores today, and lots more tomorrow.

The architecture for the Intel 48-core Single Chip Cloud Computer (SCC) processor is “arbitrarily scalable,” said Intel researcher Timothy Mattson, during a talk at the Supercomputer 2010 conference being held this week in New Orleans.

“This is an architecture that could, in principle, scale to 1,000 cores,” he said. ” I can just keep adding, adding, adding cores.”

…

For simplicity’s sake, the team used an off-the-shelf 1994-era Pentium processor design for the cores themselves. “Performance on this chip is not interesting,” Mattson said. It uses a standard x86 instruction set.

And so the message of the prophets of functional programming is ratified in part. It’s been a meme in the past several years that radically multicore processors were coming, and that the received models of concurrent programming were unlikely to do a good job utilizing the new hardware. That hardware would be slower (or at best no faster) per processor, but would instead grow the number of processors to deliver more computing power over time. And now that radical multicore is appearing, what of the claim that threading and synchronized mutable data is a dead letter? Well, ask a hardware guy:

As more cores are added to chips, [cache coherency] becomes problematic insofar as “the protocol overhead per core grows with the number of cores, leading to a ‘coherency wall’ beyond which the overhead exceeds the value of adding cores,” the paper accompanying [Intel researcher Timothy] Mattson’s talk noted.

Mattson has argued that a better approach would be to eliminate cache coherency and instead allow cores to pass messages among one another.

Interesting that hardware designers have made the same discovery that we have: to wit, that managing concurrent access to mutable state is simply not possible at scale. We often make the argument as a cognitive one—that programmers simply can’t reason effectively about heavily threaded and synchronized code—but there turns out to be a performance boundary as well.

A 1,000-core part is an automatic win if you’re writing server software whose unit of work is a computationally mundane request. You parse some text, you do some disk or socket I/O, you sort some lists, you process strings into an output buffer, and nobody gets hurt. But if your computational task is less mundane and actually requires thoughtful coordination of computing resources that doesn’t delegate well to infrastructure like an app server, then trying to manage mutable state by hand will wreck you on the same rocks as Intel’s hardware designers.

Which is not to endorse, say, actors over STM, simply because the hardware in question looks more like one than the other. But it is to say that the prophets of mutability doom are not to be ignored, and you should probably start learning new languages as if you believed them.

Archive for November, 2010

You have 1,000 Cores. Go.

About the Blogger

Upcoming Conferences

Recent Posts