For two consecutive nights, I slept on the floor of a vault. This was in 1997, if I recall correctly. At that time, I was one of the few Microsoft Certified Exchange Specialists in the area. (I may have been the only one still at that point.) I had come off of the process of designing and beginning the rollout of the world-wide email system for global fiber-optic pioneer MFS Communications. (Before they got absorbed by WorldCom.) This vault belonged to a significantly smaller and less far-flung enterprise. As a public utility serving a few million people spread across tens of thousands of square miles, however, they were a little prickly about their data center. That explains the vault.
The reason for my nocturnal pseudo-imprisonment in said vault was something relatively insignificant and yet managed to cripple their email system for days. In fact, it took us a few days to even ferret out what the problem was. My series of phone calls to Microsoft’s technical support generally ended up at the Exchange development team themselves.
In the end, we found the cause of the problem by tracking back through the path of damage that it had done. It came down to the RAID controller (the hard drive controller for those that don’t do server hardware). Specifically, there was something called “lazy write cache” that was in effect. Rather than the drive controller getting data from the database engine, writing that bit of data, verifying that it was written properly, and then asking for more work from the database engine, it says “just give it all to me… I’ll get to it when I can.” The problem with this approach is that, if there is a hiccup in writing the data, the drive controller can’t report back to the database and say, “Uh, dude? Like… we had a bit of a problem… can you give me that one piece again?”
“Being out of sync sucks.”
In the situation with my client, the ramifications of this was a massive database corruption that started with one tiny blurb of data and spread throughout their whole email system with undirected but amazingly annoying persistence. So I slept on the data center floor in 20-minute chunks, entombed by mind-boggling amounts of concrete and steel, restoring from mind-numbingly slow backup tapes one day further into the past at a time until we could find the day where the corruption actually started.
To sum up, the cause of the whole problem was that the database and the hard drives were allowed, even for a moment to be out of sync. That RAID controller was saying “trust me… all will be well in the end.” Well… we now know how well that worked out. The moral of the story? Being out of sync is a disaster waiting to happen.
Old McDonald Had a Core…
I am reminded of this delightful adventure because of a recent thread (which borders on being titled a spectacular rant) on the AiGameDev.com forums about a certain hardware manufacturer’s outrageous lies about game AI (free forum registration and introduction required to view). The participants in the discussion lash out at what seems to be Intel’s implication that game AI sucks and will continue to do so until we can have dozens, hundreds or even thousands of cores to play with. I will let you read the responses that ensued in the forum itself.
In the beginning, all we had to worry about was a single processor. Everything was done on that one processor — with the eventual exception of stuff that was offloaded to graphics cards. All the different parts of the game had to share that single processor — input, logic, rendering, etc. Anything you could think of. With the advent of Windows (et al), we were introduced to the idea that multiple programs could be running on a single processor at one time. However, we still thought of a program (be it game or otherwise) as being a single stream of commands run in order.
The idea of multiple threads for the same application in a single processor environment sometimes was a little anachronistic. After all, the different threads had to take turns on the same processor anyway. The only benefits at that time was that NiftyTaskA could interrupt NiftyTaskB even before NiftyTaskB was done. One common use for this was to keep from having sometimes sluggish world-processing calculations from making your control scheme visibly sporadic. The bottom line, however, is that they were all sharing the same processor no matter how many threads there were. In fact, you could actually slow things down just because of the overhead involved in switching between threads.
Here a Core, There a Core…
And then came the much vaunted “Dual Core” systems. And I’m not entirely sure that the game world was ready for it. As much as we like to think we are on the cutting edge of technology, there were are startling number of games that were not at all “optimized for multiple cores.” And why were they not optimized? Largely, because they were running in a single thread. I remember playing the same RPG (which shall remain unidentified) on my old laptop (P4 2.6 Ghz) and then on my new one (Dual Core 1.7 Ghz) and being startled that it was running slower. C’mon… isn’t (1.7 x 2) > 2.6 ?
A quick peek at the performance monitor realized that the game was beating the snot outta one processor while the other one skipped along at a leisurely 3% — seemingly unaware of the flurry of activity that its neighbor was churning out. So, in effect, the game was running at 1.7 Ghz. Wonderful. I wanted desperately for them to simply cut the input over to a separate thread so my cursor didn’t stutter around the screen like it was on drugs. (Or like I was.) Thankfully, they eventually released a patch that was “optimized for multi-core machines”. (Brilliant!)
Everywhere a Core! Core!
For the most part, the only benefit that was garnered from multiple cores was that some of the OS and background applications were split up so as not to compete with the game (at least not as much). But that benefit can only go so far. Even with processor hogs such as Vista and the seemingly endless onslaught of “completely necessary stuff” that wants to install itself in our system trays, the effect on the processor is relative loose change compared to the electronic bombardment of a modern PC game.
The same can be said for consoles. With the emergence of Sony’s use of SPE’s in their multi-core PS3, there is a lot of talk about how to use those efficiently. (Of course, there is also a lot of griping about how each SPE only has 256k of RAM sitting around.) One way of using them is to process little chunks of information on a temporary basis. Wow… doesn’t this look like multi-threading?
So now that we are likely going to be surrounded by billions and billions of cores (nod to Carl Sagan), what good will they do us? I suppose that you can eventually get one cute little system tray app on each one so they don’t fight with each other. And some of the crap that I’m running right now uses a startling number of threads (a glance at my Task Manager shows Outlook using 32, Skype with 27, and Steam with 61?!?). But still splitting up our games to even half this many threads still frightens us as game developers. And yet will we need to do it to expand in a technology world that may not care that we would prefer a single processor running at 32 Ghz rather than 32 processors running at 1 Ghz?
So Where Do We Cut, Doctor?
Regardless of platform or architecture, the issue that keeps coming up is “where do we split that single homogeneous flow?” The reason that it is so scary is the same moral that we gleaned from my nights behind two-foot thick steel doors. “Being out of sync sucks.” And yet that’s what we face when we start breaking things off from the main data flow.
On of the advantages of having everything in one thread was that our game processes were forced to march in order. NiftyTaskA. NiftyTaskB. NiftyTaskC… every time through the game loop, it was the same parade. Boring, yet rigidly predictable. You knew that NiftyTaskB could depend completely on everything in NiftyTaskA being completed. There is a sense of security in that. Having a separate thread for just moving the cursor around the screen is one thing. After all, we are just going to cache any input until such time as it is needed in the next frame. Anything that has to do with the data, however, gets a little more dicey.
“Debugging across threads? You get used to it.”
In 2003, my late colleague and friend Eric Dybsand glibly shrugged off questions about multi-threading. He stated quite simply that he typically offloaded an A* pathfinder to a separate thread so that agents could request paths without bogging down the rest of the game loop. In that same conversation, Eric even fielded the question of debugging data in a multi-threaded environment with the calm response of “you get used to it after a while.” (I’m waiting for Eric to respawn so that I can ask him more about his techniques in this capacity.)
Needless to say, Eric’s proclamations alarmed many people. “What if…?” was the common cry. The thought was that, by the time a path was returned, it would no longer be valid. I won’t get into the array of questions and answers here. (Entire books can be written on the caveats.) Even something as simple as pathfinding — which only reads world data… it doesn’t change it — causes some significant consternation. Imagine what would happen if we wanted to go even further than this?
The Needs of the Many…
I do have to give Intel’s claim some credit in this respect… as AI programmers, we always gripe about not having enough processing power at our disposal. However, in the past, we could rely on Dr. Moore to come to the rescue by just bestowing more processor speed on us every 8 months or so. That may no longer something we can rely on… at least not in a purely linear sense. Just flipping open your latest PC catalog over the past few years has revealed the trend that clock speeds of the processors have somewhat leveled off. Instead, we see more processors — sometimes even running slower than their single-core predecessors. Sure, we have more “processor time” available to us… it’s just no longer in one place. If we want to use it, we are going to have to get over our industry-endemic fear of threads.
So… what is the possibility and/or feasibility of multi-threading? As it inevitable? Is it necessary? Does it scare the heck out of you? Or have you even given it a shot and used it successfully? Can we do it without getting “out of sync?” Or, for that matter, does being out of sync really suck that badly?
(Alex saw my topic for this week and warned that it might be a bit touchy. So, after I hit save, I think I may pathfind myself to go hide in a big, data center vault for a few days.)