Review
neat-icon

Real-time Neuroevolution of Augmented Topologies in Video Games

Andrew Armstrong on April 17, 2008

This week’s Thursday Theory article looks at academic research in Neuroevolution of Augmenting Topologies (a.k.a. NEAT) developed at the University of Texas. In particular, you’ll learn how it’s applied at runtime in a video game to allow the actors to learn neat behaviors over time by evolution. (You can curse at the pun if you must ;)

The paper reviewed below details how NEAT is applied practically in a game called Neuro-Evolving Robotic Operatives (NERO), which is a testbed developed for the research. The gameplay is entirely based around teaching robots with a real-time version of the algorithm, which results in a rather unique new game genre. (Stay tuned to AiGameDev.com for a full review of the game.)

This post was created by Andrew Armstrong with insights and analyses by Alex Champandard. Feel free to post a comment below or in the forums to let everyone know what you think about this paper!

Motivations

From a purely academic perspective, trying to apply NEAT to games and runtime environments was probably a motivation for the research in itself. Such applications can often reveal interesting properties about algorithms, even if there are no advancements for commercial game AI.

That said, this project has interesting ramifications beyond NEAT, in particular in using stochastic optimization techniques to help create behaviors that are not scripted in the traditional way:

  1. From a gameplay perspective, this approach could lead to new genres of games — where the player trains in-game actors to perform tasks. NEAT turns out to be quite an efficient technique; while it uses neural networks, it also starts out with a very simple set of behavior and only expands the search space when it is found to be beneficial. It also performs well at finding the beneficial behavior, compared to some fixed-topology evolution.

  2. The evolutionary approach also has lots of potential for offline usage (as NEAT was originally intended) to help designers create behaviors using reward and punishment. Since individuals are evaluated one or two at a time, and the next generation is only created after the entire population is tested, it can take a lot of time to see any progress. (In the game, there is no way for the player to specify how the NEAT algorithm should run; that could be another game in itself!)

Granted, these ideas are still at the fringe of commercial games… But for the moment let’s take a look at how the system works under the hood.

Classes of Agent Behaviors with NEAT


Figure 1: Classes of agent behaviors evolvable with NEAT.

Contributions

NEAT technology is similar to other neural networks evolved with genetic algorithms. The difference is that the evolution also finds ways to modify the structure of networks as well as the underlying weights. These NN are used to decide what actions to execute (outputs) based on information form the environment (inputs). The overall process involves ranking different candidate behaviors over multiple generations of agents. Over time, behaviors are influenced by previous statistics — meaning a single action is rated on how it fulfills the requirements set down by the fitness function.

To solve the problem of agents learning online, a real-time implementation of NEAT is needed, which is the core focus of the paper. This is where rtNEAT comes in, which is implemented in NERO. Rather then produce a new generation after all existing members of the population are done learning, the simulation puts an artificial timer on the lifespan of an agent, and evaluating its performance and then either discarding that agent or respawning it to work again, along with dozens of other agents all working at the same time.

The loop used by rtNEAT, simplified, is as follows:

  1. Calculate fitness of all current individuals. For example, if there is a good bonus for getting close to an object, then actors who get close to an object have a higher fitness then those that do not.

  2. Remove the agent with the worst adjusted fitness if one has been alive sufficiently long enough. Don’t remove them before they’ve had time to improve themselves of course, and the adjusted fitness means those improving rapidly, but with low total fitness, don’t get removed. Ones with less experience shouldn’t be removed too early!

  3. Reestimate the average fitness for all species, having removed the worse; now what is our average fitness?

  4. Choose a parent species to create the new offspring and a new neural network using genetic operators like crossover and mutation.

  5. Place new agent in the world, and gather the statistics about his performance. Then rinse, and repeat.

The default time used in NERO is by default 10000 milliseconds (10 seconds) to stay alive and evaluate progress, while the loop is run continually. This means in a sandbox area, the agents can be seen to improve visibly in a short amount of time, compared to the standard NEAT implementation which requires a generation of progress to see results. The main implementation problem is the length of time agents should stay alive to learn, since if it is too long then it risks slowing down evolution to a crawl, while if it is too short then the agents will not be given ample opportunity to learn anything meaningful.

Evolved Neural Network Topology


Figure 2: An example template of neural network topology.

Abstract & References

Here’s the abstract for the paper itself:

“In most modern video games, character behavior is scripted; no matter how many times the player exploits a weakness, that weakness is never repaired. Yet if game characters could learn through interacting with the player, behavior could improve as the game is played, keeping it interesting.

This paper introduces the real-time NeuroEvolution of Augmenting Topologies (rtNEAT) method for evolving increasingly complex artificial neural networks in real time, as a game is being played. The rtNEAT method allows agents to change and improve during the game. In fact, rtNEAT makes possible an entirely new genre of video games in which the player trains a team of agents through a series of customized exercises.

To demonstrate this concept, the NeuroEvolving Robotic Operatives (NERO) game was built based on rtNEAT. In NERO, the player trains a team of virtual robots for combat against other playersâ?? teams. This paper describes results from this novel application of machine learning, and demonstrates that rtNEAT makes possible video games like NERO where agents evolve and adapt in real time. In the future, rtNEAT may allow new kinds of educational and training applications through interactive and adapting games.”

You can download the paper from the website:

Real-time Neuroevolution in the NERO Video Game
K. Stanley, B. Bryant, and R. Miikkulainen
IEEE Transactions on Evolutionary Computation 9, 2005.
Download PDF (648 KB)

Evaluation

NERO certainly suggest that this technique has some applicability to games, although it may well require AI-driven gameplay where the agents must learn or to be taught by the player. Otherwise, NEAT will have a hard time shaking the dubious reputation with neural networks and genetic algorithms in the games industry.

Applicability to games: 4/10
At the moment, this type of learning is only applicable to a niche set of games, one which is when the player trains his own army to fight such as with NERO. However, in more general circumstances, this solution is not robust enough for an AI that learns after the game is shipped (although there’s certainly a demand from players).
Usefulness for character AI: 2/10
This technique works well for generalized combat routines where random behavior isn’t too much of a problem, however individual NPCs using this technique would possibly end up with worse AI than using standard “scripted” techniques, or take longer to train to the same standard. There’s potential for offline training, but the NN/GA black-box combination will most often result in troublesome behaviors, and require a completely different skillset to develop.
Simplicity to implement: 5/10
There are two implementation options; as a game mechanic, where the player is a trainer and training the AI. This is relatively easy to implement as shown with NERO, although the gameplay must be nailed down early to avoid having to alter the AI to a large degree late in development. For use in standard commercial game as a way to train the AI, it will take much effort to implement due to possible QA nightmares caused by unpredictable and hard to debug behaviors!
Training Situations using Evolution


Figure 3: Training and in-game situations from NERO.

Discussion

The idea of using learning in a game is not exactly new; from Creatures to Black and White, it has been implemented relatively successfully. There’s no doubt lots of potential for new types of games, but is the technology robust enough to scale to more complex environments? It’s been three years since this paper was written, and there’s nothing new on the horizon…

“The evolutionary approach suffers from a complex training procedure.”

With evolutionary techniques in general, there’s a huge amount of complexity involved in the training. Sometimes an unaccounted item the AI will not know how to deal with breaks an otherwise well taught army. This makes it difficult to sell as a game mechanic in itself, due to the possible frustrations at players hating the slow process of teaching basic concepts like moving towards the enemy around a wall! These very frustrations also make the technology unfit for offline use by designers.

However, this does point the way to hybrid solutions where NNs and GAs are combined with classical AI techniques like planners or behavior trees. This would allow the in-game actors to have basic skills at a low-level, yet also learn and adapt at the high level.

Do you think this technique is usable in commercial games, or is it only applicable to niche genres where the player teaches or influences the learning of their “pets,” “robots” or “army”? Post a comment below!

Discussion 3 Comments

ChrisJRock on April 18th, 2008

It sounds like it would function better if it took more from nature. If it's meant to simulate human behavior, it has to perceive the world around it the way a human would including habits that may or may not be beneficial (superstitions). And the evolution function needs to be directly linked to success, rather than artificially as in the case of "fitness" judgment. Real organisms evolve to better survive not because it makes them "superior," but because if they're dead, they don't reproduce. In any use of a genetic evolutionary process, you're not getting the best creature all around, you're getting the best reproducer. If your aim is to design the best FPS bot, you have to link reproduction to successful kills and let the chaos play out. Part of that process would be allowing the units to evolve the best judgment for selecting mates. Otherwise your selection process will hold on to flaws the same way old scripted AI does.

JonBWalsh on April 18th, 2008

I only had time to read the blog post and skim the article but it seemed to be a little light on the performance of implementing the rtNEATS. It seems a little beyond the scope of the paper but is pretty important if you're trying to apply it to other game genres. I'm curious about the application of such a system. It seems that for most games the benefits would be minimal and maybe even harmful because of the way that the agents need training to perform well. As the paper suggests there could be room for this, or other applications of ML, in god games which is fair enough but that's not a very big genre. The one genre that possibly comes to mind for me for an application of such a technology would be MMORPGs. MMORPGs go through many many generations of a large number of agents relatively quickly and the world persists for long periods of time. This, to me, would make it the perfect environment to allow the AI to incorporate machine learning. An additional benefit is typically the AI in MMORPGs are fairly restricted in possible actions and the QA testing time is relatively large which would help to minimize any errant behaviors from making it through the QA process. The benefit I could see this giving to MMORPGs would be an improved method of the 'hate list' and use of abilities. Typically agents in MMORPGs are highly predictable in both who they are going to attack and their use of abilities. A MMORPG wishing for more dynamic battles could choose to use to implement something like the rtNEAT system to allow agents to develop criteria for their hate list over time. This could really allow for the monsters to have a targeting system that takes into account more factors and criteria allowing them to ideally make more interesting actions in who to target. On the converse side though there's still many drawbacks that would seemingly make using this technology impractical. The foremost being the cost to implement and performance restrictions of AI in MMORPGs. Especially with the more limited actions the AI takes it's possible that there just isn't enough for the AI to learn to make it a worthwhile endeavor. The performance would also be a huge concern as the AI in MMORPGs typically seem to have to be relatively simple to keep the system load to a minimum. So overall it sounds like a relatively niche implementation but it was a good read and it did get me thinking about AI in MMORPGs

stelabouras on June 17th, 2008

I have recently made a presentation based on the "Real-time Neuroevolution in the NERO Video Game" paper for "Computational Intelligence" course for my master and you can find it in the link below: http://www.slideshare.net/stelabouras/evolving-neural-network-agents-in-the-nero-video I personally believe that rtNEAT opens up a new era in gaming due to its simple implementation and its way to evolve NPCs in real time, keeping the player interested in the game.

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!