One approach to machine learning that's getting more attention these days is learning by example (also known as imitation). Not only are there more academic papers on the subject, but parts of industry and middleware companies are also turning towards this kind of ML for answers. Will Wright recently said that the "Holy Grail is crowd-sourcing the algorithms, AI, and procedures in the game" and Jeff Orkin retired from the comfort of his AI job at Monolith to research data-mining gameplay sessions.
Imitation learning is appealing because it puts more control into the hands of the designers, by considering them as trainers who provide example behaviors for a system to learn from. Compared with other solutions, the AI can make fewer assumptions about the kind of results desired, so because there's less room for error it seems to be a good fit for the games industry.
Going beyond academic prototypes, there aren't many developers applying these ideas into commercial games — with the notable exception of TruSoft, makers of behavior capture middleware (and sponsors of AiGameDev.com). What follows in the article below are video highlights from their Artificial Contender solution (see this page for more), and two blog posts that I previously wrote about the system, along with my most recent thoughts.
Artificial Contender and Behavior Capture
This section was originally published via my personal blog on July 11th, 2006. I've made some minor edits since then, added inline comments also, and TruSoft let me repost some of the videos (see them on the site).
The world cup is over (with a somewhat disappointing ending), but I want to talk about some interesting football-related technology before everyone moves on. I just received news of a new AI SDK for games called Artificial Contender, developed by TruSoft. This middleware allows game developers to add behavior capture into their games, and has been applied successfully in the making of Sony's This Is Football 2005.
Screenshot 1: This Is Football 2005.
There are a few demos on the site, including videos of gameplay, tools, and a promotional white paper — but you have to register for them. If you watch the videos, you'll see a rather familiar English football team conceding a few goals, and trying to make up for it by hoofing the ball forward. Except in this case, it's entirely the desired behavior! The designer used such long passes during training, whereas the Brazilians use shorter passes and dodging runs. Anyway, TruSoft don't give away too many interesting details on the site (as you may expect since they're selling a product), but enough to get a good overview of the product. The system is based on a combination of technology: a directed graph that models the behavior policy, and a tree-like data-structure and index the captured behavior samples. Informamlly, the graph stores the samples and the tree provides abstractions for situations.
At the base, there's a training mode that allows the designer to record a series of situation → action pairs. All the behavior recordings are combined into a "directed semantic graph" (obviously, nodes are situations and edges are actions). To compare this with motion capture solutions, individual training runs made by the designers correspond to single mocap clips, and the semantic graph is analogous to a move-tree or motion-graph, built by combining similar animation poses into a single node.
Screenshot 2: Situation Graph structure used by Artificial Contender.
View Original (TruSoft.com)
To help the system create general graphs from individual trials, there's a database back-end that stores the training samples and allows them to be queried, e.g. find situations similar to this. The graph is created on the fly within the game, and a database — in the traditional sense — is only used to fine tune the system's performance. I presume the data-structure is similar to R-Trees with a hierarchical index, which they call generalization trees. An important feature that R-Trees provide is they work in n-space, assuming it's possible to obtain a measure of similarity of two situations. The tree can thereby group similar situations under nearby leaf nodes. This makes it possible to build the semantic graph like the motion graphs are built: by merging similar situations together into one node (though with motion capture, this is much easier to accomplish).
Screenshot 3 The AC Viewer, View Original (TruSoft.com)
In practice, it seems the following steps are involved to integrate Artificial Contender into a game:
Model all possible sensors (inputs) and actuators (output) of the AI by using an API provided by the SDK.
Manually create a hierarchical model of the possible game situations, which allows the algorithm to create zoom levels for the data using generalisation trees.
Integrate the AC library into the game engine, and make its functionality accessible to the designers from within the game.
Get the designers to play the game in a particular style, and then train the AC on the sampled data.
Last but not least, work out a sensible QA plan for testing the resulting behaviors, and apply it until the results are satisfactory!
Overall, this seems like a solid combination of technology, delivered in a well thought-out package. One part that stands out is the use of reinforcement learning, which is mentioned a few times in the white paper. The idea is that NPCs get positive reinforcement for achieving goals, and in the future choose a course of action that increases the likelihood of reinforcement. Intuitively, this seems like an opposite solution to behavior capture, which does not give NPCs much autonomy in deciding what to do but instead "plays back" a similar training sample. Apparently, the use of RL is optional in the AC product; it helps fine tune the behavior policy and can provide goal-directed behavior. This combination of RL with the unsupervised learning algorithm described above allows AC to get around some typical problems with behavior capture.
From a production point of view, from the features mentioned on the Artificial Contender site, I think one is particularly valuable right now in the games industry today: designers are involved directly without having to tweak scripts, and early on in development. I don't believe such middleware will save a substantial amount of time in development, but it'll mean that time is spent (better) elsewhere — notably integrating, prototyping with- and testing Artificial Contender.
It'll be interesting to watch this product in the future; I'm curious if they'll just get bought out by a large publisher of sports franchises, or if they'll manage to find a way into the mainstream market and apply their solution successfully to other games. I expect TruSoft's consulting services will be an important part of making that happen; it'll take a lot of on-site experience with AI systems in general, and this product in particular to get something good out of it. I think those are the necessary conditions for designers to start tackling much harder problems.
Interview with Iskander Umarov
This section was published about a month after the text above on August 8th, 2006, also via my personal blog. The video is also a new addition though, showing Artificial Contender's tools and underlying representation.
About a month ago, I stumbled upon a game AI middleware solution called Artificial Contender and wrote a brief review. Then, Iskander Umarov, technical director of TruSoft (the company who developed the AC product), sent in a few corrections and clarifications. I've been on a summer vacation for a few weeks, but in the meantime, Iskander was kind enough to answer some of my questions about the product and the technology itself.
Alex J. Champandard: Typically, how many training samples do the designers use to create a semantic directed graph for an NPC? Does the algorithm perform differently when there are more/less samples?
Iskander Umarov: The number of training samples depends on:
- The game genre and the game
- Complexity of a desired playing style for an NPC
Some examples of the typical number of minutes that a designer needs to train an AC agent to create a specific well-rounded playing style are as follows:
- Fighting games: 15-30 minutes.
- Sports games: 30-40 minutes.
- Real-time strategy games: 40-90 minutes.
- Some simple playing styles can become playable within 3-10 minutes of training.
- The actual number of minutes depends on the complexity of the game, type of a playing style being trained, and on the speed.
- Depending on a game and depending on a situation within a game, one minute of training can generate about 10 – 120 training samples. For example, a typical fighting game will usually generate about 60-120 training samples a minute. An RTS game will generate 10-60 training samples every minute.
At any time, a designer can add more training samples. During testing, a designer can always see an indicator of a quality (0-100%) of an AC agent’s decisions and it is usually obvious if more training is required. Moreover, a designer can choose to continue training an AC agent for specific situations only. For example, during the AC agent’s acting, as soon as a designer sees a decrease in AC’s agent’s acting quality, a designer can switch to AC agent’s training. This functionality provides an easy way to train “substitutes” even for end-users: at any moment an end-user can switch between AC agent’s training and acting. For end-users, this functionality can be transparent – during “normal play.” The AC agent is learning, and there is a button providing control to a substitute: trained AC agent.
Usually, more training generates superior, more human-like AC agents. The performance of an AC acting algorithm increases with the number of training samples, i.e. more the AC agent knows, the faster it works.
AJC: You mentioned that reinforcement learning is used to improve the behavior generated by the graph. Is this necessary because a hierarchical pattern matching approach is not necessarily always purposeful? To what extent does RL improve this situation?
IU: RL isn’t necessary by any means. All of our behavior-capture AI agents (i.e. agents with a purpose to copy a teacher’s behavior) were created without the use of RL. To clarify, the answer to the question “Is this necessary because a hierarchical pattern matching approach is not necessarily always purposeful?” is no. We used RL for the following purposes:
Increase / decrease the difficulty level, i.e. the usage of RL helps to create stronger or weaker opponents compared to what was originally trained by a designer.
Create an adaptive AC agent that will be changing its behavior to achieve set goals. (Note: the behavior can still be within the boundaries of a trained style.)
AJC: In many large problems, I’ve found RL has trouble scaling in practice. Does the combination of RL with the graphs address this problem?
IU: Yes. In our experiments with RL, the combination with graphs helps with scaling due to the fact that graphs help preserving sequencing in the situation, action space. Also, Artificial Contender technology’s game situation classification and generalization subsystems provide help with reducing the number of potential game situations and with the speed of search respectively.
Screenshot 4: Artificial Contender was applied to multiple sports and simulation games.
AJC: How well do you think your solution will “scale up” to games that are not so confined like sports games?
IU: We are positive that Artificial Contender can be efficiently used in most game genres. Currently, we have experimented with the following:
- Real-time strategy
I’d like to make several notes about the complexity of sports games. Even though the playing field in sports games looks much easier than labyrinths in FPS games or complex landscapes in RTS games, numerous factors on the playing field create challenges similar to the ones in other genres. Examples are:
Constantly changing positions of many players on the field
Danger of losing the ball during a pass
Danger of being tackled Moving players on the field creates dynamic labyrinths and obstacles.
Furthermore, these dynamic labyrinths can be very complex because there are no distinct borders. There are no impenetrable walls, and there are no clear spaces. Moving and passing directions can be more dangerous or less dangerous, but almost never absolutely safe or absolutely impossible. This continuity increases both the state space and the action space. To give another example, an issue of “taking cover” in FPS can be compared to an issue of the possibility of a pass being intercepted.
AJC: What is going to be the major focus for TruSoft on the product in the near future?
IU: It seems a robust combination of technology already, are there any particular aspects you are keen to improve? The major focus for us right now is to integrate AC into more games and to support more game genres. For the next generation of AC technology we see the following main areas to work on:
- Enhanced multi-agent cooperative behavior support.
- Capturing data from the real-world. Currently, AC agents learn from human players playing the same video/computer game. In the future, we would love to see games where some of AC agents have been trained directly from the real-world data.
AJC: That sounds like a very interesting problem. I look forward to future developments from TruSoft... Thanks for your time Iskander!
Live Online Session, Sunday 24th
If you find the topic of behavior capture and imitation learning interesting, then be sure to join this weekend's session with Iskander Umarov, Technical Director of Artificial Contender. Over the years, TruSoft has had the most experience with deploying this kind of technology in games, so it'll be a fascinating discussion. (Why else would I spend 1h43 of my time in a pre-interview with Iskander? :-)
Click here for more details about the event. It's a public session, open for all to attend... See you then!