There's lots of Cell Processor mojo happening at Georgia Tech. My colleague
Sean Lee and I teach Cell programming during the last part of our
Multicore and GPU Programming for Video Games class, and my colleague David Bader runs a STI Center for Excellence for the Cell, so hitting every Cell talk I could find was a primary goal of the my week at GDC.
I spent all day Tuesday in the
Insomniac Games' Secrets of Console and Playstation 3 Programming tutorial (done by
Mike Acton, Engine Director, Eric Christensen, Principle Engine Programmer, Jonathan Garrett, Senior Engine Programmer, Mark Lee, Senior Engine Programmer, and Joe Valenzuela, Engine Programmer). My family came to San Francisco with me, and this includes my almost-one year old son. No one had explained to him about the time change from East Coast to West Coast time, and the general chaos of travel had further messed up his internal clock, so his idea of when to sleep and my idea of when to sleep didn't necessarily line up too well. Hence, I was too tired to try to follow all the details of the various bits of C and assembly code that were floating around on the slides; fortunately, the GDC folks send e-mailed electronic copies of the slides ahead of time to people who registered to the tutorial, so my strategy was to sit back and try to absorb the big picture, under the theory that I could look over the code in detail later when my brain is in better shape.
On Wednesday I saw two Cell processor talks,
The Playstation 3's SPUs in the Real World - Killzone 2 Case Study (Michiel van der Leeuw,
Guerrilla Games) and
Practical SPU Usage in God of War 3 (Vassily Fillipov and Jim Tilander,
Sony, Santa Monica). It was interesting to contrast the approach taken by the Killzone 2 and God of War 3 teams with the approach espoused the previous day by Insomniac. There were some commonalities: all teams feel that the SPUs should be thought of as general-purpose processors, and not specialized GPU or DSP style chips, and all have custom tools that allow them to visualize SPU usage during gameplay.
I'm guessing that the slides from all the talks will appear on various company websites at one point or another, so the notes below mostly focus on off-the-cuff comments made by the speakers. I will also aimlessly pontificate a bit.
Favorite quote: "I'd rather hire a EE major than a CS major." - Mike Acton
A question came up near the end of one of the sessions about what the difference between computer science and computer engineering was. I paraphrased my colleague
Wayne Wolf, who likes to say that CompE is where CS meets the laws of physics.
Commonplace Cell strategy vs. the
Insomniac strategy: The K2 and GoW 3 teams both took what seems to be the most common Cell programming approach: start by putting everything on the PPU, and then offload stuff to the SPU as necessary. The GoW 3 team emphasized making sure all code could compile on both the PPU and SPU; they said that they could them wrap a piece of PPU code with DMA calls and change the compile flag, and then also drop back to the original PPU version for debugging purposes if need be. (I'm not sure about this approach; although the SIMD intrinsics available on the PPU and SPU are similar, they're not exactly the same. There's some intrinsics available on the PPU but not the SPU and vice-versa.) This contrasts quite strongly with the Insomniac approach, which focuses on the SPUs immediately. Most Cell programmers seem content to let major crunching remain on the PPU; in the Insomniac viewpoint, the PPU should just be directing traffic, and the SPUs should be thought of as the heart of the Cell, not as "coprocessors."
I'm not a fan of the Cell architecture (I'm going to keep teaching it anyway, since the PS3 will be around for quite a while, and many of my former students tell me that having Cell experience on their resume has landed them interviews), but assuming you need to program on the Cell, I think Insomniac is doing it right and is way ahead of the curve. I find it somewhat ironic that Insomniac, a third party publisher (admittedly one with a long and close relationship with Sony), is more on top of the Cell than the Sony folks doing GoW 3. (It does make sense, though; Insomniac did Resistance: Fall of Man, which was a PS3 launch title, whereas the last products by the GoW folks were PS2 and PSP titles, so this is probably their first journey into the PS3.)
Insomniac has a
fantastic R&D page, where they reveal a lot of their special sauce. Anyone working on the Cell in general and the PS3 in particular should pay close attention to what they post there.
One of the most interesting notions to come out of the Insomniac tutorial was the notion of "SPU shaders." By shader, they mean small little bits of SPU code that get DMAed to the SPU along with the data. SPU shaders can load up other SPU shaders, in which case the original shader can remain in the local store if desired. They have a custom loader that hangs out permanently in the SPU local stores that helps direct the loading of these bits of SPU shader code; this loader only takes up about 2K.
The Insomniac folks are rather distrustful of automated scheduling tools; they noted that they don't really use malloc, preferring to handle memory management by hand. An automatic scheduler multiplexes time the way malloc multiplexes memory; just like they don't trust automatic allocation of memory, they don't trust automatic allocation of time. They have their own custom scheduling tools, and do not rely on things like SPURS.
A little side jaunt of PS3 vs. Xbox 360 flamebait: Over the course of the week - and from talking to console programmers in general - I've noticed that PS3 coders seem to most often talk about being GPU bound, and Xbox 360 coders seem to most often talk about being CPU bound.
One common thread I've noticed is that people working on the PS3 seem to often offload graphics onto the SPUs, including things that would be much more naturally done in GPU shader code. It seems that it's pretty hard to max out the 360's Xenos GPU, whereas it's relatively easy to hit the limit of the PS3's RSX. People usually attribute this primarily to the unified shaders of the Xenos, vs. the dedicated vertex and pixel shaders of the RSX (which is largely based on the NVIDIA 7800, from what I understand). However, while I think that's part of it, I don't think that's the main reason. I think the real special sauce of the Xbox 360 is the 10 megabytes of embedded frame buffer RAM built into the GPU, and which the rasterization hardware can write to rediculously quickly, and which has logic for z-buffering, alpha stenciling, etc., built into the RAM. I know of no PC video card by NVIDIA or ATI that has that kind of structure.
My general impression is that the CPU (Xenon) and GPU (Xenos) for the 360 were designed with each other in mind. The 360 GPU would be quite difficult to put on a regular PC graphics card, since it has the 360's memory arbitration hardware built into it and has a design rather specific to the overall 360 architecture, whereas I think you could readily make a PC graphics card out of the RSX. It reminds me of Seymour Cray's quote (paraphrasing) that it's not enough to just design a fast processor - you have to design a fast system.
I've heard a rumor (have no idea how true or not this is) that Sony originally intended to put two Cells in each PS3 and not have a dedicated 3-D processor. The thought was that one Cell would usually primarily handle graphics, and the other Cell would primarily handle game code (as on the actual PS3). However, they found the Cell was not fast enough at being a GPU as a dedicated GPU, so they dropped the second Cell and added the RSX. Again, I have no idea if that's true, but it makes a kind of sense in that it parallels the dual Vector Unit architecture of Emotion Engine, where one of the VU1 is hooked to the rasterization circuitry of the Graphics Synthesizer and hence is intended primarily to handle graphics, and VU0 is primarily intended to handle physics, etc., but you have some flexibility in terms of what is handling what - you could run some physics on VU1 and some graphics on VU0 if you wanted. Although I don't know nearly as much about the PS3 architecture as I do about the Xbox 360 architecture, my overall impression is that the RSX feels "tacked on" to the PS3 - actually, kind of like in a regular PC - compared with the way the Xenos integrates with the Xenon.
Sony, Toshiba, and IBM put a massive amount of money and time into developing the Cell. The above discussion makes me wonder - what if they had put part of that effort into developing a Cell-specific GPU, something designed to optimally integrate into the Cell architecture? Or, for that matter, what if they had simply put a more powerful GPU in the box, something more along the lines of the NVIDIA G80 series than G70? What if they had played the embedded fast DRAM card that the Xbox 360 played? In any of these cases, I expect the PS3 would indeed trample the 360.
But as it is, the "which console is more powerful" question turns out to be a wash. I suspect both systems, when taken as a whole, are equally powerful. The Cell might, under the hands of skilled programmers who understand its architecture well, be more powerful that the Xenon, but the Cell is hamstringed by a weaker GPU. PS3 developers are always talking about offloading processing from the RSX to the SPUs, whereas you rarely hear Xbox 360 programmers saying the same thing. Most of the "crunch" of modern games is in the graphics, and the Xbox 360 graphics system is extraordinarily good at pushing polygons.
When I talk to programmers working on cross-platform development, they always talk about how much easier the Xbox 360 is to program than the PS3, and how far ahead the Xbox 360 version is of the PS3 version. I've yet to hear from a multiplatform developer that prefers to work on the PS3 over the 360. Part of this seems to be that the Microsoft tools seem more mature than the Sony tools. (Again, I am going by rumor here; I have no experience with the real devkits of either box. Your mileage may very). Part of this is that programmers have a hard time dealing with the small local stores of the Cell SPUs and have difficulty wrapping their head around all the DMA calls needed to manually shuttle data around (although Acton pointed out that this really isn't much more complex than calling memcpy). However, now that the PS3 has been out for several years, I don't think developers can really use "it's just too hard" as an excuse anymore. If you're a publisher and one developer is happy to developer for both platforms but another says they can't do a PS3 version because it's "too hard," it should be obvious who you should go with.
If it was the case that one console was more powerful than the other, you'd never find that out form companies doing cross-platform development, as they generally try to get their games to play the same on both platforms. Only companies that focus on one platform will really be able to max it out. But then, you'd never really know if you couldn't do it on the other platform! So it's a bit of an unobservable.
That all said, based on the SPU usage visualizations I saw flashing up during the various talks, I suspect we haven't seen games that fully max out what the PS3 can do yet, whereas I suspect that programmers have essentially maxed out the Xbox 360. This somewhat parallels the trend from the last generation - early Gamecube and Xbox games look much like later Gamecube and Xbox games, whereas PS2 games showed steady improvement over the course of its lifecycle as programmers learned to wrestle with the Emotion Engine. (See the original Bloodrayne vs. Bloodrayne 2 for a good example; although the Kojima keynote reminded me just how amazing Metal Gear Solid 2 looked considering how early it came out in the PS2's lifecycle.) For instance, God of War looks way better than I would expect would be possible on a PS2 just looking at the specs, and the programmers for God of War have said that they believe they maxed out the PS2.
My prediction is the first team to fully max out the PS3 will be Insomniac, with Resistance 3.
"Solution for cross-platform devs: start with the PS3 version." When I hear people say things like this, they're usually thinking from a strict project scheduling viewpoint: if the PS3 team seems to always be lagging the 360 team by six months, start the PS3 development six months early. However, this viewpoint misses the benefits of the natural side effects of Cell programming discipline.
I've seen Mike Acton speak a couple of times now, and he is always saying "data is more important than code," and that to get decent code, you need to first think carefully about what data you have and what transformations need to take place on the data. Acton emphasizes understanding how your data flows through the system, and whittling down any description of the transformations on the data to focus on the specific pieces of the data that they need. He feels that the "domain-model design" is a lie, and that object-oriented design often creates more problems than it solves: "There's no reason to model your data after some abstraction in your head about how the world works; you need to model data on how you need to transform it." Acton said that good understanding of data will naturally lead to good code, but bad understanding of your data will naturally lead to crappy code. He said: "If you're spending time worrying about the organization of your code, step back and spend at least half that time thinking about your data." If you're having trouble formulating your code, it probably means you don't really understand your data and how it flows. Although the "data is more important than code" philosophy is particularly important when dealing with the Cell, it's important when doing programming in general.
A point Acton kept emphasizing was that if you've organized your data and computations so that they run well on the Cell, they will automatically be well organized for other architectures. Doing your development on the Cell first forces you to think about these issues in a way that you might not do if you started on another architecture. I imagine that making sure your data structures fit nicely on the SPUs means that they will automatically fit nicely in the Xbox 360 caches, and writing your routines so that they operate on these small chunks of data mean that your code will rarely be needing to access data outside of the cache. I also imagine that having to manually program the DMA transfers on the Cell will lead you to naturally find places that you would want to set up a cache prefetch on the Xbox 360.
Looking at the above paragraphs, I realized that I've missed an educational opportunity in my
Multicore and GPU Programming for Video Games class. I've generally talked about the Cell as something you need to fight against when programming; I'm now realizing a better (and more motivating) approach would be to emphasize the way the Cell forces you to think about your data the ways your data needs to be transformed.
Focusing on the platform: Acton felt that "software is not a platform" - even if you're writing in a scripting language for the web, you want to understand what different types of hardware it might be running run: "Software doesn't live in the aether, it always runs on real hardware."
The Null Fairy: Acton spent some time railing against "the null fairy." He said programmers are always checking whether pointers are null. He said that nothing should be coming in and making your pointer be null; you should be knowing what routines are calling your code, and be able to make guarantees, at a higher level, about what those routines are passing to your code. This was part of a general discussion about error checks. If you can't make guarantees about the data that your code is given, it may be a symptom that you don't fully understand your data flow. He felt that if you really needed to check to make sure that the data coming is was correct, then your code is probably cannot be optimized.
Acton's focus on the null fairy emphasizes a big difference between game programming and other kinds of programming: all the programming effort is going into making the game. In most programming contexts, any code you write may wind up being reused in some other project, so you don't necessarily know what will be calling your code, and particularly in a web services context, something might be trying to feed your code bad data in order to try to exploit a bug. I teach at a university, and we teach our students to religiously check for errors on inputs, so we probably contribute to the null fairy.
I wonder if there's a useful compromise: use compiler flags to check for null pointers during debugging - since when you're writing, uninitialized pointers are likely to crop up, and it's better to find them easily than just getting raw memory corruption - and then take out the checks once things stabilize.
Can object-oriented programming survive in these new high performance architectures? People working with SIMD architectures talk about "structure-of-arrays" vs. "array-of-structures." The natural OO approach leans towards array-of-structures, where each structure has say an x, y, z, and w component (say for homogenous coordinates in graphics.) GPUs store things this way, but remember they're already streaming. To get good performance out of a 4-way SIMD processor, it's best to store a four x's in one vector, four y's in another, etc. - this is a structure-of-arrays. This is main the reason some CPU SIMD instruction sets deliberately leave out a dot product instruction; it's to encourage people to use structure-of-arrays, but this forces you to break up your "objects" in unnatural ways - well, at least from an "object-oriented" standpoint, but as you'll notice from the note above Acton is not a big fan of big-scale OO. It will be interesting to see how these issues play out in the long term.