Hierarchical Logic and Multi-threaded Game AI

Alex J. Champandard on August 15, 2008

These days, the topic of multi-threading comes up rather frequently. You’ll either hear sensationalist doomsday scenarios about how incredibly hard it is to implement, or how some language researcher has invented a revolutionary new technology that solves all the world’s concurrency problems. Meanwhile, in the background, game developers have been innovating quietly, squeezing more and more performance out of their multi-core hardware…

With the troublesome first generation of games on modern consoles out of the way, it’s fair to say that developers have generally achieved a good grasp of multi-threading technology. It’s not quite perfect yet, but there’s a solid base to build upon. This article looks at ways that multi-threading is currently applied to artificial intelligence in games, and sets the scene for a series of articles that will investigate the process of applying parallelism to hierarchical logic (e.g. behavior trees or task network planners).

NOTE: You can sign-up here (free) to the brand new Insider’s Area at and read the next tutorial in the series — along with its C++ source code. You’ll see various benchmarks of a proof-of-concept hierarchical planner built with Intel’s Threaded Building Blocks library.

Typical Approach to Multi-threading

In practice, game AI is threaded in a very pragmatic manner: “What part of the code is taking the most time and how can it be optimized?” This approach has allowed developers to find bits of code that are well suited to being threaded, such as:

  • Pathfinding, when it’s low-level enough and doesn’t require callbacks to the main AI.

  • Sensory systems processes such as: line of sight checks, many-to-many proximity checks.

  • High-level combat decisions where appropriate, such as terrain & situation analysis.

You can do relatively well with multi-threading by opportunistically offloading such computations to different threads when performance becomes a problem [PDF, 1]. But with a bit of experience, these systems can be designed upfront so they are better suited to being multi-threaded. This is particularly the case with heterogeneous architectures like the PS3; since you have only one main PPU and multiple smaller SPUs with very limited memory, you’ll typically need to architect your code & data so it can fit in SPU memory upfront.

Hardware Accelerated Line-Of-Sight

Screenshot 1: Line of sight calculations are relatively easy to multi-thread, and it’s even possible to offload such computations to dedicated hardware (demo).

With that in mind, developers increasingly build simple frameworks to multi-thread their games [PDF, 2]. This typically includes a way to pass around computation (as function pointers) with their data (as references or pointers), and a mechanism for passing back the result of the computation. Such frameworks work on most hardware, but particularly shine on homogeneous architectures like the Xbox360 or PC since it’s so easy to apply anywhere in the code with little hassle.

Even though these techniques work very well in practice, beyond separating sensory processing and path-finding calculations, the AI decision making is not typically handled in a multi-threaded fashion. Is there room for improvement here? Would multi-threading the AI’s reasoning process benefit games in terms of increased performance or improved intelligence?

A Game­play Architecture for Performance [1]
Terrance Cohen, Insomniac Games
Download PDF (390 Kb)
Dragged Kicking and Screaming: Source Multicore [2]
Tom Leonard, Valve Software
Download PDF (235 Kb)

Motivation & Challenges

Over the last few years, the best AI systems have slowly moved towards a more uniform and automated approach to generating behavior. These days, the cutting edge systems that are being developed are typically STRIPS planners or behavior trees. (Of course, not everyone has such systems in place, but in AAA studios that emphasize solid AI, such solutions are increasingly common.)

The main motivation for this series of articles is to leverage such uniform AI frameworks to provide an equally uniform approach to multi-threading that could benefit the AI across the board – with little more work than it takes to create the AI system in the first place.

What does this mean in practice? This kind of technology could potentially:

  1. Reduce the time taken to figure out which new behavior should be executed in the current context, if there’s currently no behavior running.

  2. Decrease the cost of checking alternative course of actions to see if they are higher priority, or guaranteeing that the current behavior up-to-date.

  3. Speed up the actual execution of behaviors too, as long as they offer a certain level of parallelism (e.g. procedurally animating different body parts).

Performance of a Parallel Planner

Figure 2: Preliminary theoretical benchmarks that show the effectiveness of parallelizing typical AI code depending on its ratio of sequential to parallel logic (source, free registration required).

However, the problem with multi-threading AI computation compared to other approaches is the following:

  • There are restrictions in the order of the computations, imposed by data dependencies that need to be passed from one task to another. This rules out straightforward solutions based on data parallelism.

  • The structure of these dependencies are only fully known at runtime while the AI is reasoning. This also rules out solutions crafted manually by programmers development time.

Effectively, the challenge here is computing the correct dependencies, threading what’s possible and making sure the rest is executed according to the specification.


This series of articles will draw from a multitude of different technologies, mixing and matching where appropriate.


Planners can be as simple as implementing as a basic graph search. If the state used by this planner is self-contained and does not use using blocking procedures on shared variables, then the search can be parallelized by duplicating the state for every parallel search of a sub-graph.

Suggested Reading: STRIPS, F.E.A.R.’s SDK

Hierarchical Planners

Hierarchical Task Network (HTN) planners provide a useful and elegant way to model dependencies implicitly using sequences of behaviors. You can also model indirect dependencies thanks to the direct-acyclic graph structure of the domains (i.e. you just link to a shared behavior). This allows you to structure your logic and the order of the computation with much less effort that it would require to model pre-conditions and side-effects.

Suggested Reading: SHOP, AI Planning Course

Behavior Trees

HTN planners are very similar to behavior trees, except that behavior trees do less planning and focus more on reactive monitoring and control — also known as reactive planning. What’s useful about behavior trees in the context of multi-threading is that they are structured as latent procedures that execute over time, and the whole logic is built to deal with any-time behavior.

Suggested Reading: Behavior Trees for Next-Gen AI

Task Schedulers

The final ingredient in the mix is a task scheduler. This technique makes it easy to implement things like behavior trees in an event driven way, such that little time is wasted polling and querying tasks, instead there’s a nice callback mechanism that helps parent tasks figure out when the child computation is done. (My article in AI Wisdom 4 shows how this can be done.)

Suggested Reading: Threaded Building Blocks, AI Wisdom 4

Put everything together and you’ll start to get an idea of multi-threaded prototypes that can be built with this technology, for example simple hierarchical planners that’s built as a hierarchy of tasks with their computation managed centrally by a scheduler.

Coming Soon…

Over the next few months, on this blog (RSS) and in the Insider’s Area (free sign-up), you’ll learn about various different prototypes that combine this technology, and find out more about the relative benefits and pitfalls. In particular, the articles will tackle:

  1. Theoretical performance boost of threading sequences to selectors in typical AI logic, when searching the whole tree (worst case).

  2. Over-head of memory allocations, for tasks and copies of state state. Multi-threading friendly memory allocations.

  3. Expected performance boost in the average case, assuming that the whole tree won’t be searched exhaustively.

  4. Strategies to improve the cache coherence during multi-threading, by running similar computations in the same thread.

The first article is already online here! If you have any suggestions or comments, feel free to post them below!

Discussion 0 Comments

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!