Later today at the conference on Artificial Intelligence in Interactive Digital Entertainment 2010 (AIIDE '10), Alexander Kring will be giving a presentation about our joint paper on hierarchical pathfinding, which Nick Samarin implemented in the AI Sandbox. The title of the paper is “DHPA* and SHPA*: Efficient Hierarchical Pathfinding in Dynamic and Static Game Worlds” and should be available soon. We developed two modified versions of the HPA* algorithm that work better in both static and dynamic environments.
I think this paper is worth discussing, not only for the ideas behind the research, but for the doors that it opens up. There's a huge amount of research to be done into performance on multi-core architectures, as well as job/task processors — whether they use some kind of intelligent scheduling or a manually-tuned processor allocation. Because it's such a well know problem, pathfinding is one of the best places to start with improving the parallelization of game AI.
Screenshot 1: Hierarchical graph of a 2D grid-based map, used to perform HPA*. This is a screenshot of our version optimized for static environments, also known as SHPA*.
Background on HPA*
HPA* is an algorithm for hierarchical pathfinding. It breaks down the original grid into areas and creates a graph from those areas. To search, you need to run a A* within the areas to connect up your start- and end-points to the high-level graph, then run a first-pass search at the high-level, and finally refine the path using A*.
The major problems with HPA* is the way the algorithm is specified; you need to do the following:
Lock the graph's data-structure when it's changed to make sure that you only run one search at a time, or
Implement an extra level of indirection to hide the problem of inserting the start- and end-points into the graph.
Both of these result in extra work while programming the algorithm, and also cause huge problems in performance. You could argue (like one of our reviewers did) that HPA* doesn't force you to implement it this way, but in practice the way the algorithm is defined has caused practitioners in industry to implement it using these two approaches — neither of which scales so well.
The solution to getting around this particular problem is actually simple. It consists of two parts: one trick for initializing the search that I used before on a strategic pathfinder (PREMIUM) and another trick that Alexander uses — and has used in the past — for terminating the search. Here's what you need to do:
When initializing the search, you don't insert start- and end-nodes into the high-level graph. You keep the graph read-only for performance. Instead, you drop a bunch of search items into your A* algorithm's open list, each replacing the connection that you would have established in your graph. It's a little less optimal theoretically, but it saves you lots of messing around with graph changes so it pays of in terms of code simplicity and performance overall.
When searching the high-level graph and checking for the termination condition within A*, you check if the current node's area matches the parent area of your goal low-level node. This is again slightly less optimal theoretically; you should search the low-level to see which entrance into the area is best... Despite this, the paths are only a few percent sub-optimal in practice, and you don't particularly notice either!
Based on these little search tricks, you can use different representations to make it theoretically less sub-optimal and more efficient in practice. These optimizations fall into two categories, depending on whether the environments are static or not.
Screenshot 2: The areas that are automatically generated based on the obstacles and the underlying grid. Note the areas are fully self-connected, and tend to follow the shape of the world.
When I started working on the hierarchical graph for the AI Sandbox, I was experimenting with some simple area clustering algorithms. I'd used this before and talked about this briefly (PREMIUM) at the Paris Game AI Conference last year, and it turned out that William van der Sterren had taken those ideas much further (see his masterclass on terrain analysis, also PREMIUM).
The idea is that you merge together nodes/cells/waypoints in your low-level representation according to a heuristic, and automatically create areas — as shown in this video on the blog. The magic is in the heuristic, but those can be surprisingly simple in practice. I mentioned the importance of having a good combination of low-level heuristics to merge areas together based on how strongly connected they are, and a high-level heuristics to make sure the graph is never too densely connected. In the AI Sandbox, Nick Samarin experimented with different approaches and we settled on just using a simple heuristic that merges strongly connected areas but applies a maximum area size veto.
The big problem with this approach is that it's slow; you basically can't get it much faster than 4s-5s on normal size maps using clustering. I'm sure we could use a different approach to optimize this, but it wouldn't drop much lower without compromizing the quality of the graph. (The algorithm is surprisingly good at finding rooms and doorways.) Regardless, the performance problems limit the applicability of this algorithm to static world — or at least worlds that don't change very often.
The benefits you get out of this clustering is a nicely connected high-level graph that fits your geometry very well. You know that the nodes in each area are fully connected to each other, and you don't have to worry about disconnections. Because you can spend more time on preparing this high-level area graph, you could even optimize it to make sure search performance is good. In Paris, I mentioned using around 64-96 areas in a production level strategic graph, and that was tuned based on memory and performance tradeoffs.
Once you have this graph, you can basically do a very fast search. Within the areas, since you know they are fully connected, you can basically use a simple Euclidean heuristic. This is very similar to what Nathan Sturtevant did in DRAGON AGE; watch his full interview here (PREMIUM), or read the white paper he wrote:
Memory-Efficient Abstractions for Pathfinding Nathan Sturtevant, AIIDE 2007. Download PDF
Screenshot 3: Regular grid-based solutions, despite having not fully connected areas, can be easier and faster to work with in rapidly changing environments.
Obviously, even a technique that takes a few seconds to process is not ideal for dynamic worlds. You can no doubt make some assumptions, like repairing the areas locally as obstacles move around. Chris Jurney managed this in COMPANY OF HEROES (see his Dealing with Destruction talk) where he repairs the high-level sectors manually. By spreading larger graph repairs over multiple frames, it should be fast enough for realtime as long as you're not doing searches while repairing.
For the AI Sandbox, we chose to try a different approach, partly based on Chris Jurney's comments in the hierarchical annotated A* masterclass (PREMIUM) with Daniel Harabor. We went back to the regular grid that HPA* uses. However, to make even more efficient in dynamic worlds and avoid the problems we mentioned above, we used the following techniques:
We build a so-called "spanning tree" within each area to make it efficient to connect up nodes in the area to the high-level graph. This is done using Dijkstra's algorithm, which allows us to calculate the distance from the various entrances and exits of an area to all the nodes contained within that area in one pass — rather than having to run A* multiple times.
Running this Dijkstra algorithm also provides us with enough information to repair or adjust the high-level graph when there are obstacles moving in that area. If an entrance to an area is no longer connected to an exit, then it'll be found via this search and the graph will be updated accordingly.
The spanning tree itself is basically cached as a grid of indices, so depending on the maximum size of each area this doesn't need to be very big. We also cache the distance to each exit so it's even quicker to look up when starting the hierarchical search.
Combined together these tricks give us a huge speed boost over HPA*, and particularly for short searches which is where it matters most. Indeed, many commercial implementations of pathfinders are incremental so you often only request the start of the path, and the destinations are nearby.
Screenshot 4: The graph based on the regular grid shown above. This is dynamically repaired whenever an obstacle moves in the world.
You might be thinking to yourself: "What's so great about this? It sounds obvious." And you'd be entirely right! This paper is probably one of the least 'revolutionary' to be presented at AIIDE this year, but at the same time it's also one of the more solid and industry-ready. In fact, both Alexander and I have shipped individual components of these algorithms in respective games — though not on a 2D grid.
Beyond that, I personally think this research is fascinating for a few reasons:
It splits up the hierarchical pathfinding problem into static and dynamic worlds. Both of those are very different and must be optimized very differently. Just that separation will help classify and evaluate future research better.
It shows how to remove the multi-threading problems with HPA* so we can finally start addressing modern pathfinding issues. Many pathfinder implementations in games are mainly serial these days, though maybe offloaded to a separate thread.
It points the way forwards for research in terms of finding a trade-off between memory usage and computation, particularly in the context of multi-threaded and multi-core architectures.
Recently Jad Nohra, who wrote the original HPA* implementation for the AI Sandbox, also experimented with performance of the implementation and experimented with the various effects of graph-size, map complexity and search length on execution time. You can find the article online on Intel's site.
“With many AI problems, you get to decide when to perform a calculation, not just what and how.”
All of this, from the AIIDE paper to the article and our research behind both, point towards the next big low-level problem in game AI: how to efficiently leverage underlying multi-core hardware. With AI like many other aspects of game development, you can pick your battles, your data-structures, and your algorithms accordingly. But there's more to it in the case of AI & pathfinding; you can decide when you want to schedule your computation as the information isn't absolutely required to render the frame. This flexibility adds another dimension to the problem. You must choose when to best perform your pathfinding requests based on what's going on, based on other potential requests, and based on how much computation you have to do already.
Then, there's the issue of deciding whether it's worth doing a standalone hierarchical pathfind even if it's much slower than a batch done later on, for example if you particularly need the information this very frame. And if you decide to process such high-priority queries despite their inefficiencies, can you leverage the resulting data somehow to improve the AI? Then, there are also questions of finding compromises between memory and computation. How much information does it make sense to keep over multiple frames? Should you discard all your cache each frame, for example if the data was calculated on a SPU it might not even be worthwhile sending it back — you just need the result.
Our paper and the solution presented only scratches the surface. It shows the benefits of batching overall, but doesn't do so in a context sensitive way. Regardless, I hope you realize now the importance of this topic, and the potential for future research there. Pathfinding is a good place to start as many researchers are familiar with it, but it applies equally well to line-of-sight and other sensory queries, or even high-level reasoning on multiple cores... Exciting times ahead!