Comments | benfrantzdale: Programming for Modern Hardware (required reading)

benfrantzdale

Programming for Modern Hardware (required reading)

Sep 25, 2007 08:31

I am a big fan of Herb Sutter's work. He has great books on C++ programming techniques, for example. He recently gave a talk about programing for modern machine architectures. The very short version is that memory latency is the root of (almost) all performance issues. [blog entry][Google video][corresponding slidesFor a teaser, did you know that ( Read more... )

programming, video, psa

Comments 5

mtbg September 26 2007, 07:07:38 UTC

Personally, the only sane solution I see is for application programming to be done at a sufficiently high level of abstraction to encompass all of the concurrency.

I would offer a (I believe) slightly different suggestion, that concurrency should be explicitly supported at the programming-language level. We (the computer science research community) have been working on concurrent languages for a long time, and smart people are still thinking about the problem. For instance, Jayadev Misra (author of "Drinking Philosophers") has some of his minions working on a language called Orc, which embodies some interesting ideas.

Anyway, yes, we live in interesting times.

benfrantzdale September 26 2007, 11:05:17 UTC

Interesting. I've heard of Drinking Philosophers before, but not Orc. It sounds from this talk that C++ may soon get a std::atomic class template, although the semantics of that aren't clear. I'd imagine it has a way of giving you either a const volatile T& or going through a lock to give you a T&.

That would deal with correctness... it still doesn't deal with some of the performance issues that Sutter mentions; particularly that your parallel performance gets destroyed if two threads need to write to different variables that happen to live in the same cache line. Yikes.

bonboard September 28 2007, 04:48:11 UTC

I find that 99% figure to be unlikely.

Sure, you can count L1, L2 cache, and some of the MMU, but it's not 99%, I don't believe.

benfrantzdale September 28 2007, 11:28:41 UTC

On page 9 of the slides he discusses this and cites a source. He says on an Itanium 2, 85% is L1, L2, and L3 and then (in the video) goes on to say that the remaining 15% is largely pipelining hardware, branch-prediction hardware, etc.

Unfortunately neither Sutter nor his source provide citations for that claim, but it sounds plausible. Unless the number of transistors required to implement the instruction set has grown a lot, the actual processing components would only be getting smaller.

bonboard October 9 2007, 21:47:18 UTC

The Itanium 2 is probably the most pathologic example of this to date. Of its 1.7B gates, 1.5B are consumed by godawful amounts of 3-level on-die cache. Core 2 Duo and Hammer aren't nearly that bad, nor is POWER6's die, nor is the new SPARC.