Automated AI Testing: Unraveling the Combinatorial Explosion

Dave Mark on April 29, 2008

Dave Mark is back this week to introduce’s regular developer discussion. When he’s not working on lucrative database contracts, he spends his time developing AI over at Intrinsic Algorithm. Post a comment below and let him know how you think testing game AI code compares to SQL statements!

When I started to write this column over a week ago, I was in the middle of a minor crunch-worthy catastrophe on the project that I was doing for a client. I’m working on a database project for a local retailer and we are in the process of trying to export a very large amount of data to his clients. (This is all relevant, stick with me here!) When we dumped the over 32,000 result rows of data from the database, my client decided to quickly check to see if a few select items were included. Much to his consternation, he couldn’t find the items that he was looking for. In the process of exploring why, I noticed that some of the data that was in there seemed like it might possibly not be correct. My client looked at my samples and confirmed that, indeed, they were showing errors in dollar amounts that were subtle but significant. And this is where my weekend went to heck… and why Alex covered for me in this column last week. (Again, hang in there, this description is more than an excuse for not being here!)

Ironically, the false starts that I made on this column were starting to outline the theme of the column — that of automated testing. I didn’t think about the connection until later in the week. The reason that it came to mind was when my client asked me, “and how do we know that this new run is completely correct?” My short answer truly had to be, “unless we are willing to hand-check every one of the 32,000 rows of data, we can’t be sure.”

What it came down to was that we had to trust all the layers of data sources, queries, and algorithms — some of which were not even in our control. I had to check each step in the process, confirm that it was doing what it was supposed to do, check a quick sample of what it was spitting out, and move on to the next. I had to proceed under the tenuous premise that, if each step along the way was correct, the end result would be as well. As for that black-box data we were getting from elsewhere? Well, we had to trust that the programmer responsible for that had done his due diligence as well. Unless we wanted to check all 32,000 rows by hand and eye. And how do you tell at a glance if numbers that spill out of a formula or correct of if they are off by some amount? Wow… too bad we didn’t have a way of automating the testing of the data! Or did we?

Algorithmic Lego

Algorithmic Lego

While none of the adventure above had anything at all to do with games, AI, bots, agents, simulations or even a lot of mathematical processes, there are some parts that may seem familiar to my gentle readers.

Game developers live in a world where we are often constructing our products in a less-than-holistic way. We often create a number of little black box processes that fit together like algorithmic Lego. If each block does its job, we end up at the ultimate goal of having created “a behavior” or “a sequence of behaviors”. But we are not just searching for “a behavior” in a particular situation - we want “the specific behavior” that is called for. And that specific behavior may be one of dozens, scores, or even hundreds of possible behaviors. To make matters worse, our little algorithmic Lego don’t always fit together quite as comfortably as we would like. They can twist and turn and morph… and often aren’t really meant to click together at all.

Oh… and to make matters worse? As game developers, we are working in 4 dimensions, not just 3. Our solutions must be correct in the dimension of time - not just in a snapshot slice. Not only do we concern ourselves with sequences that necessarily must occur over time, but the distances measured in time are important. Did this happen quick enough? Enough of a delay? In sync with another related but unconnected behavior elsewhere?

Algorithmic Lego

Another caveat… Unlike the project I was working with where the inclusion of certain rows and the specific, check-able numbers were the output, in game AI, the numbers themselves are only a means to an end. The behaviors we are looking for are sometimes a little more obscure - more subjective than objective. While you can test to see if a result number is what you were hoping it would be when it gets spit out of the other end, there is no test that can be created that can replace the very subjective, sense-based jury of “that looks right“.

How Big IS the Big Bang?

How complex could agent-based AI really get? It’s only a few behaviors after all… I was dealing with 32,000 rows of data culled from a batch of over 400,000 items taken from multiple systems, through multiple processes and multiple filters. Could game AI really be that complicated?

Let’s start with a simple hypothetical:

  • 1 agent with 5 states

  • 5 states with 1 transition each = 5 transitions (never mind that…)

  • 5 states with 1 transition to each of the other 4 states = 5 * 4 = 20 transitions?

  • 5 interacting agents, each with 5 states = 3125 combinations of the agents’ states

  • 3125 agent-state combinations * 4 potential transitions for each of 5 agents (20) = 62500 potential individual transitions at any given moment.

At an average of one state transition per agent per second, over a 5 second period, there could be 3125 potential sequences of the 5 states. The combination of sequences over the 5 seconds between all 5 agents is… uh… 3125^5 = 298,023,223,876,953,125

So if we change that one parameter for that one transition threshold for that one agent by 0.5%, it’s only a small change, right? If we wanted to test the ramifications of that parameter and how the 5 agents interact over time we would only have to test… how many situations? 298 * 10^15? You know what? Never mind. My 32,000 rows of simple data starts to look attractive.

An artists rendition of the combinatorial explosion of game AI!

Photo 1: An artists rendition of what the combinatorial explosion of game AI feels like. Does anyone here want to claim that any one of those millions of stars is out of place?

A few weeks ago in this column, I mentioned that, at an E3 panel discussion, the luminaries Will Wright, Peter Molyneux and David Jones stated that automated testing was an important tool in taming the emergent behavior that was a part of their respective games. This illustrates how automated testing has been increasingly a staple of game development. In fact, automated testing is a required component for methodologies such as Agile Development. Being able to do unit tests on various systems is a solid method of production that helps ensure (or at least increase confidence) that each of the building blocks is performing as advertised. But at what point can you no longer perform testing? Again, when you get to the subjective nature of the way AI looks, you begin to realize that automated testing is a little dicey.

Human vs. Computer?

But if you leave the testing to human eyeballs alone, there are other issues. Can you play through a battle, a level, or an entire game and point to a particular behavior that never showed up? Like the rows that my client found quite by accident, if you aren’t specifically looking for something, you aren’t going to notice its absence. And what of the subtle differences such as the prices that were incorrect in my client’s data? They were a string of numbers that were in the right range - but weren’t exact. We wouldn’t have known unless we looked for it specifically. There are behaviors that may look rightish (to mimic the vernacular of my teenagers) but aren’t exactly right. While that may be quickly testable by a designer on the fly at development time, what about that combinatorial explosion I illustrated above?

Magnify a blank page, you see more blank page… just bigger!

How far do you have to wade into that morass until you can be certain that those little innocuous vagaries aren’t going to develop into something completely obnoxious? Can you truly test them all? Can you guarantee that you have checked your work enough times and in a broad enough manner to cover all the contingencies? Let’s face it… no matter what sort of QA budget we secure, it will be nothing compared to the QA of even a hundred thousand users playing through millions of times and encountering hundreds of millions of iterations of events. And, in the days of blogs, Digg and YouTube, you can be quite sure that every little unaccounted for parameter will be well publicized.

So what’s the answer? Certainly automated AI testing has a place - more so in some genres and some applications than others. Is this something that needs to be explored better, however? And what are some potential solutions to find things that are not there, make sure that behaviors fall within parameters, or look reasonable? And most importantly, how do we make sure that we have explored all the dark nooks and crannies of the potential state space at the far reaches of that combinatorial explosion to make sure that our delicate cosmic balance doesn’t get sucked into an algorithmic black hole?

Join this week’s discussion and post a comment below or in the forums!

Discussion 5 Comments

Dave Mark on May 2nd, 2008

[This is an automated test of the comment system.] ;-)

kofman on May 3rd, 2008

If the information is tangible, such as data consisting of strings and numbers in an SQL database, than an automated approach is certainly a great solution. If the information is less tangible such as behavior in a video game, than the only solution is to add more information (rules and behaviors) to the system to avoid the the critical problems. As for everything else, their merely [I]features[/I]. It was a good read Mark.

Dave Mark on May 4th, 2008

Thanks for the reply. I agree that adding constraints is a good approach... but of course, how do you know if your constraints work in all situations? :-) More testing?

gwaredd on May 6th, 2008

I would just like to share some personal experience with automated testing in games ;) Personally I have found unit tests pretty useless for anything other than trivial bugs. We took a snap shot of a bug database half way through beta and under 5%[1] of the bugs would have been found by unit tests. This is not to say I think they are bad - just you don't get the ROI as some people claim. Actually test driven design does produce pretty robust code in practice, but it is more of a programming methoodoly than part of a test plan. I have had good results though with smoke tests and what I call 'functional' tests. Smoke tests are quite easy to construct - you run the game and see if it blows up. For example, a simple level hopper script that loads each level in turn or runs a camera around a spline and records the FPS. These can be automated easily enough and part of the build process. Depending on the game (and time), you can expand this such as writing an AI controller for the player to 'play' the game. Functional tests are specific 'test' scenes that you can use to test a particular peice of functionality. A simple example would be an 'asset viewer' that an artist can use to see how their models look like in game. Sometimes these can be automated, sometimes it is too much effort - however at least having a test scene makes it quick for a human tester to check or a programmer to work on without the distraction of the rest of the game logic. OK - these don't give you 100% test coverage but they do yield pretty good practical results in the field. Just my $0.02 Gw [1] %5 being a completely none scientific figure reached by a subjective and probably biased analysis of each bug - i.e. me saying, yep we may have had a unit test for that one ;)

Andrew on May 6th, 2008

I did read some story, although I can't remember on what site, where the author worked in QA and eventually wrote scripts or mini programs. In his example he had a whole days manually checking of weapon models to output videos, which could be scanned through in minutes not hours. That's certainly automation which is good timesaving, great notes you've put down Gw! Exactly the same vein of time saving. :)

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!