The Neural Network

This is the second tutorial in the series, dealing with artificial neural networks (NN). Building on our previous work, we'll create another animat known as Onno. The AI is based on machine learning (ML) techniques used to create deathmatch behaviours with indirect assistance from the developer.

Table of Contents

Overview
Methodology
Development & Experimentation
Discussion
Conclusion

Readers who enjoy this issue of Exercises in Game AI Programming are encouraged to:

This tutorial covers the background details first, introducting the NN technology available and the requirements for Onno. Then, we'll discuss the methodology used to train the system, and we'll analyse the different parameters in depth. Finally, the results are studied and explained.

Overview

Goals

Onno has the exact same functional requirements as the previous implementation (see The Rule-Based System). Specifically, it should be capable of movement and fire control for attacking the enemy (motor skills), as well as weapon selection and tactical reasoning (decision making). This time, it's the NN that's used to decide which actions to take based on the current situation.

The previous rule-based solution could be classed as an expert system, where the developer provides knowledge explicitly as a set of statements. However, in this tutorial, we'll approach the development differently in order to maximise the benefits of our NN. We'll attempt to use a more implicit approach where the system learns from a few examples provided by the expert.

Technology

A standard neural network architecture, called the Perceptron, is used here. The perceptron is used to determine a set of output values from the corresponding inputs, thanks to a process called forward propagation. This allows perceptrons to perform pattern recognition on real numbers; the inputs are the data samples (aka. predictor variables) and the ouputs represent the patterns (aka. response variables).

A perceptron works thanks to floating point connections between inputs and outputs, known as the weights. The weights are essentially multiplied with the inputs to produce the outputs. Perceptrons have the advantage of being able to learn relationship from examples. To do this, the weights an are adjusted by comparing the actual result to the desired result. This process is based on backward propagation. There are training procedures that use examples one by one like backprop (incremental), but there are also algorithms like RProp that process the database of examples as a whole (batch).

Perceptrons are computationally very simple, but multiple perceptrons may be cascaded together. Essentially, the output of the first layer of weights is connected to the input of the next. With the right parameters, this increases the capabilities of the NN so it can solve non-linear problems too. These are known as multi-layer perceptons or MLP for short.

[Tip]Tip

Knowledge of the theory behind neural networks isn't essential for applying them in practice, but it certainly helps. Keep in mind that there are two important parameters that we'll use to tweak the perceptron: the total number of layers, and the number of intermediate values in the middle layers (aka. hidden units).

Principles

A lot of the low-level C++ code can be reused from Breaker's implementation. This includes most of the sensor and effector functions that allow the AI technique to interact with the world. So in essence, just like in the previous tutorial, the wrapper C++ code is given as little responsibility as possible in order to push the neural network to its limits.

One of the advantages of substituting the RBS directly with a NN is that most of our efforts in this week's issue can focus on the methodology of working with machine learning (ML). Compared to more direct programming, this tutorial is a change in approach that is closer to academic practices than those used in the games industry. These skills will become increasingly important as the game AI matures.

Methodology

The main concern for training the perceptron is where to get the examples from. The data itself is very important for NN, and even ML in general; it's extremely difficult to learn anything valuable from poor quality data!

There are two main sources from which to collect the data. A perceptron can learn from:

  1. Behaviors of human players. Sample situations are gathered in a deathmatch game and used as a model for the NN to imitate.

  2. Expert crafted examples. The AI designer can provide specific cases from which a complete policy can be learnt.

In this tutorial, we'll use the second option. As a matter of fact, it turns out that we crafted expert examples to create Breaker in The Rule-Based System, so we'll reuse most of that work to build a database of training samples.

[Note]Note

There are other ways to train a NN, but the approach used here is most characteristic of perceptrons and supervised learning in general. Later issues in this series cover other possible paradigms such as artificial evolution and reinforcement learning — which can be used in conjunction with NN.

Working with machine learning is a very uncertain process, much more so than using explicit AI techniques. In fact, I often consider it more like research and development than programming. Our methodology in this tutorial is, therefore, to minimize doubt by building incrementally on working prototypes. In practice, this means we'll start with simple wandering abilities and move to full deatchmatch behaviours progressively. Once that works, we can focus on solving increasingly hard pattern recognition problems with our perceptrons.

Development & Experimentation

Essentially, the rule-based system is used to provide ideal examples created by the expert. The if-then format proves to be a very intuitive representation to deal with generally, but it's also more efficient for us to reuse the previous tutorial. So, a slightly modified version of Breaker is used to gather data samples. The input/output pairs are dumped to a log file, based on the content of the working memory. The data samples are written constantly to the log file as the simulation progresses. To speed up the process and increase the diversity of the data samples, time acceleration is used.

Two data sets are created in this way:

  • Wandering — Only one animat is inserted into the game, so only obstacle avoidance, wandering and gathering behaviours ever get used.

  • Combat — Four animats are inserted into the game, so the entire spectrum of behaviours gets applied during the simulation, including combat.

The format of the file is one line per data sample, with senses and actions separated by a dash. On the left are the boolean values for each of the 13 sensors used by the RBS, and on the right are the 17 possible actions:

0 0 0 0 0 0 0 1 0 0 1 1 0  -  0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

The order of the binary numbers is important, as each corresponds to a specific sensor and effector. The order is the same as in the previous tutorial, but see the source code for more details.

While the log files reach a few hundred kilobytes quite quickly, not all the data is necessary; there is a lot of redundant information. Repeated data samples emphasize how often the situations occur in the game, so they could be used to evaluate the learnt behaviours. However, during learning, it's much better to have a smaller dataset with a more diverse coverage of the problem space. So, part of the preparation to get the neural network to work involves filtering this training data.

Pre-processing of the data samples is done with a small Python script that was made specially for this task. Downloading Perl snippets from random mailing lists proved to be more trouble than it was worth. A custom Python script has the advantage of being easily tweaked for any purpose that comes up (see the file randomize.py in the SDK). The script essentially removes all the duplicate lines, randomizes their order, and formats the output as a legal C++ array definition. The result is two data files:

  • WanderingSamples.h with 78 cases.

  • CombatSamples.h with 586 cases.

Each of the datasets is split into two: a training set and a validation set. The training set is used to teach the NN, while the validation set is used to check the learnt results against other “official” data. Both sets are picked randomly, rather than in the order the data samples were gathered. (This is why the script randomizes the lines of the log file.)

Gathering data with a RBS in an accelerated simulation provides very good coverage of all the kinds of situation that can occur. While the dataset is certainly not complete, it's more than enough. If the NN is capable of solving the problem, using a full training set and no validation set should provide acceptable behaviours. Then, reducing the size of the training set would push the perceptron to its limits, and reveal how well it can learn to generalise from specific examples.

[Note]Note

A good way to evaluate performance is to compute the average error. The perceptron's outputs are in the range [-1.0:1.0], but since we're making boolean decisions, it's acceptable to use just the sign. An average error around 1.0 means the result has the right sign in 50% of the cases (a rough estimate). As the average error decreases, so does the likelihood of the output sign being correct.

The following experiments build up complexity in small steps:

  1. For starters, we can decide how complex the wandering problem is by trying to learn 100% of the cases without any validation set. Using the batch training first (it's more powerful) with a single layer perceptron (it's simpler) the error drops down to below average 0.1 quite easily, barely after 23 iterations. The resulting behaviours are very good, so the data indeed provides a good approximaton of the entire solution.

  2. Using training set of 25% size, the error drops faster and lower during learning, but validation reports a high error of over 0.386. The behaviors are strange as the bot does not even move forwards. This is a sign of overfitting, as the training samples have been learnt too closely. Stopping after 15 iterations brings the validation error down to almost 0.3, but the behaviours still suffer a bit (Onno prefers turning left).

  3. Moving on to the bigger combat dataset, a single layer perceptron achieves 0.1 error on the whole training set. However, Onno does not seem capable of handling even wandering behaviors. After a bit of experimentation, it becomes obvious that only fragments of the complete deathmatch combat behaviours can be replicated by a single layer perceptron. The problem must be non-linear.

  4. Using a multi-laye perceptron on the entire dataset solves the problem. With almost any reasonable number of hidden units (around 10/20), The error gets down to 0.05 easily. 26 hidden units, or twice the number of inputs, seems to provide the best error with 0.01 after two thousand iterations — see Figure 1, “Convergence of Various Perceptrons on the Deathmatch Dataset”. The behaviours are faultless, and Onno measures up well in combat against Breaker.

  5. From the previous experiment, we can assume that the combat dataset is also satisfactory, covering at least the important input/output samples. To make the problem harder for the NN, we can reduce the number of training samples provided. (In theory, the designer would have to do less work.) With 40% training samples, the behaviours are still acceptable and there are no problems with overfitting.

  6. At 20%, things become more challenging; a fair amount of experimentation is required to get working behaviours. A smaller number of hidden units (5-9) with early stopping (50/100 epochs) helps the algorithm generalise a deathmatch policy. However, it is no longer acceptable in quality, and noticably inconsistent visibly.

Convergence of Various Perceptrons on the Deathmatch Dataset

Multiple perceptrons with variable hidden units were trained on the full deathmatch dataset. No hidden layers (0 hidden units) provides the worst error, and 26 hidden units increases convergence.

Figure 1. Convergence of Various Perceptrons on the Deathmatch Dataset

To increase the responsibilities of the NN even further, we could make the problem more complex by reducing the number of inputs and outputs. For example, instead of having two inputs indicating low health and full health, we could merge those into one continuous input. The same goes for ouputs; instead of having a different value for turn left and turn right, we could express that as one continuous output value for turning. Appart from the data preparation process, the neural network would probably require a few more hidden units to deal with the additional complexity.

Discussion

The implementation of Onno as described above is available as part of the FEAR SDK, which can be found on SourceForge.net:

The code for Onno can be found in #/demos/tutorial/02-Onno/. See the FEAR User Guide to get started.

About the Technology

When using neural networks and machine learning, it's very easy to start doubting almost every part of the implementation if something doesn't work. In fact, the code for gathering the data and the script that pre-processes the data was rewritten twice, for elegance as much as correctness! The whole development process requires good methods, solid practices and just strict discipline generally.

The NN itself does very well when there's a lot of quality data. As the quantity or quality of data decreases, the perceptron struggles to produce useful results. Having a designer specify consistent data manually would be less error prone, but probably more time consuming. For this reason, boostrapping the NN by training it with a RBS first proves to be a great compromise, as it gets good results upfront but offers all the benefits of the perceptron.

One major difference with the setup between this tutorial and the previous one is that the RBS is based on boolean symbols, while the NN uses continuous values by default. The training samples are boolean too, but the NN learns to estimate the results in continuous space. To make the most of this, it's necessary to change the senses and actions to use floating point arithmetic, which expresses levels of confidence in each value — like in fuzzy logic.

[Note]Note

Another issue of this series discusses fuzzy logic in more depth. In essence, values of 0.0f indicate zero confidence in the result, and 1.0f implies complete confidence. Values in between have partial degrees of membership.

Making each of the senses and actions continuous is relatively simple in most cases; for example, the distance of obstacles are better suited to continuous values than booleans. In other cases, computing a degree of membership is less trivial (e.g. a collision or presence of an object), so these are best left as booleans. However, modifying the input and ouput values in such a way can disrupt the functioning of the NN. This can be fixed by using a non-linear functions (for example x*x) to compute the degrees of membership so they are closer to their boolean counterparts.

Evaluating the Behaviours

The behaviours produced by the NN are a bit smoother than the RBS because of the “neuro-fuzzy” representation. This doesn't make much difference to the combat skills, but it does visibly increase the levels of realism. However, replacing the boolean values with continuous ones leads to behaviour glitches. For example, if the obstacle distance sensors are not sensitive enough, the animats just ignore the side walls and get stuck in some cases. (Using a different function to encode these input values fixes the problem.)

The neural network developed in this tutorial is trained as a pre-process, and does not learn during the game itself (offline). Since the AI is driven by a fixed input-output mapping, it is comparable with a rule-based system. The behaviours of Onno are comparable with those of Breaker. There are no obvious flaws, and the result of a head to head deathmatch tends to be fairly even (although in the few games I watched Onno was winning slightly, for no good reason).

The behaviours provided by the NN are static in this tutorial, as they simply rely on training data. This is a standard approach for doing statistical machine learning. The advantage of using a NN over the RBS — appart from smoother behaviours — is that the AI can be enhanced to learn online. Reinforcement learning is a suitable technique for this purpose, but we'll apply that in a later issue of this tutorial series.

Conclusion

Neural networks are often difficult to apply in practice, as the resulting behaviours can be unpredictable and hard to manage. However, in this tutorial, we saw how to create a neural network that could create AI at least as good as a rule-based approach. We used the expert examples provided by the RBS from the previous tutorial to train the neural network. The advantage of the NN is that it can deal with fuzzy inputs & outputs to provide smoother behaviours. A designer could also have provide key examples for the perceptron to learn from. Finally, the NN is in a format that is well suited to learning within the game using reinforcement learning, as we'll see in a later issue of this tutorial series.

The next tutorial uses decision trees to create similar deathmatch behaviours. Be sure to sign up to the mailing list to be notified in advance of its publication:

Your Email:

Be sure to check out chapters 17 to 20 of the book AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors for more in depth theory, and tips on using NN in practice.

Until next time…