I have a 13-year background in relational database design. For those that aren’t familiar with the basic tenets of relational database models, here’s one of the major principles. You cluster like data in tables. Records in any given table can point to related records in other tables. You don’t necessarily have to point to other data, but if you do point to data in another table, it damn well better be there. In fact, the technical term for such a rule is “referential integrity.” If a record gets deleted, anything that points to it from other tables needs to be deleted as well. Thankfully, the tools in such platforms as Access and SQL Server take care of a lot of the heavy lifting in that regard… often as simple as clicking boxes such as “Enforce Referential Integrity” and “Cascade Delete”.
Coming from that database background, it was a peculiar transition to not have that model in place for me. As I started working on my very data-intensive airline management simulation in C++, I tossed about between various models of how to arrange my classes and objects. I had large collections of like items and there were certainly well-defined relationships between different types of items. I longed for the simplicity of a database model… in fact, at first it was difficult for me to break out of that mentality.
The two questions that I needed to solve were:
How do I store these collections of information?
How do I reference individual items in that collection from elsewhere in the game?
What a Wide Array!
Obviously, one method that is a staple of programming was to use arrays. Let’s face it, storing data in an array is pretty darn simple. And accessing it later is as easy as referencing the array by index. However, while that worked well for data that had a cap on it like the number of airlines, cities and airports in the game, it faltered somewhat on data that could wildly change in size — such as the number of aircraft, flights or passengers in the game at any point. I hated to define a fixed size array — chewing up an associated fixed amount of memory — for something that may never be used.
“[They] not only changed in numbers but also tended to be rather short-lived.”
On the other side of the spectrum, aircraft, flights and especially the passengers was something that not only changed in numbers but also tended to be rather short-lived… particularly in the case of the passengers. I could place arbitrary limits on the numbers, but they would have to be suitably large to accommodate the maximum at any one time. (In fact, I do place limits on the maximum numbers but it is done via the simulation model rather than the data structure itself… don’t ask.)
Perhaps I should explain something with regards to my passenger model so as to really set the tone of this issue. With my extreme focus on realism in this situation, I am generating hundreds of thousands of passenger blocks (each of which can represent between 1 and about 25 passengers). Sometimes these pax (pardon my abbreviation… when you have typed “passengers” thousands of times, it gets old) will be around for the few moments it takes them to determine that they can’t find a suitable itinerary and then are removed. Sometimes they will be traveling in only a few days or even hours of the simulation time. Sometimes they are booking up to 8 weeks in advance. That means that I needed a data structure that would allow me to have pax records jumping in and out very often.
These pax also needed to referenced individually by the other entities in the game. For example, a flight scheduled weeks into the future had to know who the pax were that had booked on it (to know how full it was, for example). A flight that was about to depart had to keep track of its “pax manifest” so that it could update the states and locations of the pax that were on board. This lends itself intuitively to an array setup. When you want a pax, you look him up by index.
Gimme a Vector, Victor
A nice happy medium was using an STL vector container. It would dynamically expand as necessary to accommodate the growing (and shrinking) lists. I didn’t have to pre-define the numbers and therefore pre-reserve the memory. I could still reference items by number as well. However, there was one further problem.
If I used a vector or an array, however, things would start out nice and contiguous as I generate my initial records… but eventually I would have more holes than filled spots as records got deleted. That was partially dealt with by not removing the record itself but changing its state to “empty” and adding it’s index to a list of empty slots. When I created a new record, I would pop the first record in the “empty” list and store the new record in the referenced slot. That also helped to keep the memory from getting splashed all over the inside of my machine.
I did have one concern with the array/vector-like structure, however. With all of the brutal processing that I do when dealing with that many pax, I worried that search times in an array should start to get costly. When you are working with a few dozen pax blocks, you don’t care that the entire list is 600,000 records long… you just want to go straight to the ones you are concerned with. That led me to store the references to those pax via pointers rather than by their index numbers.
Could You Give Me Some Pointers?
Of course, pointers have their own caveats… especially with regard to saving and loading. As we all know, if I were to save and reload the game, there is no guarantee (and that’s putting the odds kinda lightly) that a particular pax record is going to end up at exactly the same place on my silicon. Let’s face it… it simply won’t end up in the same spot. That means that I would have to serialize the data when I save it into something indexed and then store the index when I save the records for the flights, etc. When I load, I would have to do the process in reverse — “OK, Pax number 117,832… grab a seat over there in the memory and then tell me what memory number is on your chair so I can tell your flight where you are sitting.” While this was not a pleasant prospect, I would rather kill the time on a save or load sequence if it meant speed while running the game.
I’ve toyed around with a hybrid of both. Hold the index number of a pax for reference, but also hold the direct memory pointer for doing the actual labor. That got kinda silly, however.
All Together Now!
For purposes of this column, I’ve actually made things even simpler than they are. For example, my Pax actually contain other objects such as their itinerary which is a list of pointers to the flights they are booked on. It may be empty or it may contain between one and three flights. (I had to cap the number of flights they were willing to take at 3 because the combinatorial explosion on booking itineraries when there are thousands of possible combinations of flights was a complete… well… you know.) So, since the sizes were different, that makes storing the objects in the array not feasible. So I have such arrangements as a CPaxCollection which has a list of pointers to CPax which has (among other things) an object CItinerary which as a list of pointers to CSchedFlight. (All CSchedFlight records are also held in a vector in CSchedFlightCollection.) There’s more where that came from, too.
I don’t claim to have the knottiest problem in game development. I simply used my experience with my game to show some of the pitfalls that I encountered as I progressed. I often thought that there had to be an easier way! How different this all is from just having database tables for Pax, Flights, Itineraries and what-not!
“We are pushing a lot of interrelated data around these days.”
Of course, data structures is certainly a topic that cuts right down to the core of game development in general but also very much to AI. As the numbers of both the entities that we manage and the world parameters that we reference increase in numbers and types, we are continually going to be faced with how best to manage those links. What’s more, we are increasingly relying on layers of abstraction to support such concepts as knowledge representation and group coordination. We are pushing a lot of interrelated data around these days.
There have been a handful of speeches, papers and examples of how to handle these issues. There was even a column in the latest Game Developer magazine by Noel Llopis on “Managing Data Relationships” where he talked about “smart pointers”. And yet no one seems to have the proverbial “silver bullet” in this regard.
What strategies have you employed in your own work to handle complex interactions between large amounts of data? Do you prefer the orderly arrays and vectors or the “straight to the point” pointers? How do you manage serialization? Your thrashed memory? Your migraine?