This week’s Thursday Theory article looks at academic research in Neuroevolution of Augmenting Topologies (a.k.a. NEAT) developed at the University of Texas. In particular, you’ll learn how it’s applied at runtime in a video game to allow the actors to learn neat behaviors over time by evolution. (You can curse at the pun if you must ;)
The paper reviewed below details how NEAT is applied practically in a game called Neuro-Evolving Robotic Operatives (NERO), which is a testbed developed for the research. The gameplay is entirely based around teaching robots with a real-time version of the algorithm, which results in a rather unique new game genre. (Stay tuned to AiGameDev.com for a full review of the game.)
From a purely academic perspective, trying to apply NEAT to games and runtime environments was probably a motivation for the research in itself. Such applications can often reveal interesting properties about algorithms, even if there are no advancements for commercial game AI.
That said, this project has interesting ramifications beyond NEAT, in particular in using stochastic optimization techniques to help create behaviors that are not scripted in the traditional way:
From a gameplay perspective, this approach could lead to new genres of games — where the player trains in-game actors to perform tasks. NEAT turns out to be quite an efficient technique; while it uses neural networks, it also starts out with a very simple set of behavior and only expands the search space when it is found to be beneficial. It also performs well at finding the beneficial behavior, compared to some fixed-topology evolution.
The evolutionary approach also has lots of potential for offline usage (as NEAT was originally intended) to help designers create behaviors using reward and punishment. Since individuals are evaluated one or two at a time, and the next generation is only created after the entire population is tested, it can take a lot of time to see any progress. (In the game, there is no way for the player to specify how the NEAT algorithm should run; that could be another game in itself!)
Granted, these ideas are still at the fringe of commercial games… But for the moment let’s take a look at how the system works under the hood.
Figure 1: Classes of agent behaviors evolvable with NEAT.
NEAT technology is similar to other neural networks evolved with genetic algorithms. The difference is that the evolution also finds ways to modify the structure of networks as well as the underlying weights. These NN are used to decide what actions to execute (outputs) based on information form the environment (inputs). The overall process involves ranking different candidate behaviors over multiple generations of agents. Over time, behaviors are influenced by previous statistics — meaning a single action is rated on how it fulfills the requirements set down by the fitness function.
To solve the problem of agents learning online, a real-time implementation of NEAT is needed, which is the core focus of the paper. This is where rtNEAT comes in, which is implemented in NERO. Rather then produce a new generation after all existing members of the population are done learning, the simulation puts an artificial timer on the lifespan of an agent, and evaluating its performance and then either discarding that agent or respawning it to work again, along with dozens of other agents all working at the same time.
The loop used by rtNEAT, simplified, is as follows:
Calculate fitness of all current individuals. For example, if there is a good bonus for getting close to an object, then actors who get close to an object have a higher fitness then those that do not.
Remove the agent with the worst adjusted fitness if one has been alive sufficiently long enough. Don’t remove them before they’ve had time to improve themselves of course, and the adjusted fitness means those improving rapidly, but with low total fitness, don’t get removed. Ones with less experience shouldn’t be removed too early!
Reestimate the average fitness for all species, having removed the worse; now what is our average fitness?
Choose a parent species to create the new offspring and a new neural network using genetic operators like crossover and mutation.
Place new agent in the world, and gather the statistics about his performance. Then rinse, and repeat.
The default time used in NERO is by default 10000 milliseconds (10 seconds) to stay alive and evaluate progress, while the loop is run continually. This means in a sandbox area, the agents can be seen to improve visibly in a short amount of time, compared to the standard NEAT implementation which requires a generation of progress to see results. The main implementation problem is the length of time agents should stay alive to learn, since if it is too long then it risks slowing down evolution to a crawl, while if it is too short then the agents will not be given ample opportunity to learn anything meaningful.
Figure 2: An example template of neural network topology.
Abstract & References
Here’s the abstract for the paper itself:
“In most modern video games, character behavior is scripted; no matter how many times the player exploits a weakness, that weakness is never repaired. Yet if game characters could learn through interacting with the player, behavior could improve as the game is played, keeping it interesting.
This paper introduces the real-time NeuroEvolution of Augmenting Topologies (rtNEAT) method for evolving increasingly complex artificial neural networks in real time, as a game is being played. The rtNEAT method allows agents to change and improve during the game. In fact, rtNEAT makes possible an entirely new genre of video games in which the player trains a team of agents through a series of customized exercises.
To demonstrate this concept, the NeuroEvolving Robotic Operatives (NERO) game was built based on rtNEAT. In NERO, the player trains a team of virtual robots for combat against other playersâ?? teams. This paper describes results from this novel application of machine learning, and demonstrates that rtNEAT makes possible video games like NERO where agents evolve and adapt in real time. In the future, rtNEAT may allow new kinds of educational and training applications through interactive and adapting games.”
You can download the paper from the website:
Real-time Neuroevolution in the NERO Video Game K. Stanley, B. Bryant, and R. Miikkulainen IEEE Transactions on Evolutionary Computation 9, 2005. Download PDF (648 KB)
NERO certainly suggest that this technique has some applicability to games, although it may well require AI-driven gameplay where the agents must learn or to be taught by the player. Otherwise, NEAT will have a hard time shaking the dubious reputation with neural networks and genetic algorithms in the games industry.
- Applicability to games: 4/10
- At the moment, this type of learning is only applicable to a niche set of games, one which is when the player trains his own army to fight such as with NERO. However, in more general circumstances, this solution is not robust enough for an AI that learns after the game is shipped (although there’s certainly a demand from players).
- Usefulness for character AI: 2/10
- This technique works well for generalized combat routines where random behavior isn’t too much of a problem, however individual NPCs using this technique would possibly end up with worse AI than using standard “scripted” techniques, or take longer to train to the same standard. There’s potential for offline training, but the NN/GA black-box combination will most often result in troublesome behaviors, and require a completely different skillset to develop.
- Simplicity to implement: 5/10
- There are two implementation options; as a game mechanic, where the player is a trainer and training the AI. This is relatively easy to implement as shown with NERO, although the gameplay must be nailed down early to avoid having to alter the AI to a large degree late in development. For use in standard commercial game as a way to train the AI, it will take much effort to implement due to possible QA nightmares caused by unpredictable and hard to debug behaviors!
Figure 3: Training and in-game situations from NERO.
The idea of using learning in a game is not exactly new; from Creatures to Black and White, it has been implemented relatively successfully. There’s no doubt lots of potential for new types of games, but is the technology robust enough to scale to more complex environments? It’s been three years since this paper was written, and there’s nothing new on the horizon…
“The evolutionary approach suffers from a complex training procedure.”
With evolutionary techniques in general, there’s a huge amount of complexity involved in the training. Sometimes an unaccounted item the AI will not know how to deal with breaks an otherwise well taught army. This makes it difficult to sell as a game mechanic in itself, due to the possible frustrations at players hating the slow process of teaching basic concepts like moving towards the enemy around a wall! These very frustrations also make the technology unfit for offline use by designers.
However, this does point the way to hybrid solutions where NNs and GAs are combined with classical AI techniques like planners or behavior trees. This would allow the in-game actors to have basic skills at a low-level, yet also learn and adapt at the high level.
Do you think this technique is usable in commercial games, or is it only applicable to niche genres where the player teaches or influences the learning of their “pets,” “robots” or “army”? Post a comment below!