Many of you have brought up excellent questions (in the comments on the blog and in the forums) about behavior trees since my talk at GDC. Over the next few weeks, I’ll take the time to answer some of them from personal experience, but also using the wealth of practical and academic knowledge on the subject where appropriate.
In particular, this week’s Thursday Theory post looks into teleo-reactive programs, as proposed by Nils J. Nilsson from the Artificial Intelligence Laboratory at Stanford University.
I consider the paper reviewed in this article to be a landmark in the field of robotics (like PRS). This one is particularly interesting for game AI because it provides some critical insights in how to design responsive behavior trees.
Traditionally in robotics, when you design a hardware circuit to control the robot, you get the luxury of continuous feedback as the inputs to the circuit are always be connected to the latest sensory information. However, as Nilsson points out, this isn’t the case with software:
“Some of the central ideas of computer science, namely sequences, events, discrete actions, and subroutines, seem at odds with the notion of continuous feedback. For example, in conventional programming when one program calls another, the calling program is suspended until the called program returns control. This feature is awkward in applications in which the called program might encounter unexpected environmental circumstances with which it was not designed to cope. In such cases, the calling program can regain control only through interrupts explicitly provided by the programmer.”
Of course, when designing the AI logic for a robot that needs to respond to changes in the world, this becomes a problem. This paper address this by:
Designing a representation for making logic more responsive while still being goal-directed.
Providing methodologies and formalisms for helping design this logic effectively.
In particular, the Teleo-Reactive Programs try to emulate the more continuous behavior of a circuit.
Figure 1: A circuit representing a teleo-reactive sequence.
The major contribution of this paper is the concept of a dynamically adaptible goal-directed behavior:
“A teleo-reactive (T-R) sequence is an agent control program that directs the agent toward a goal (hence teleo) in a manner that takes into account changing environmental circumstances (hence reactive).”
T-R sequences are built as an ordered set of production rules, expressed as condition/action pairs. These are evaluated in order from first to last, in a fashion typical for rule-based systems. As soon as a condition matches, its action is executed.
There are, however, key differences:
The actions can be durative, which means they continue indefinitely (for example, MoveForwards.)
The conditions are evaluated continuously, so at each update the highest priority action that’s applicable is run.
These T-R sequences are typically built backwards from the goal, with the simplest pair of condition/action appearing first and others following in order of complexity. For example:
If the goal is complete, then no action is required.
If the goal is a step away, take the according primitive action.
If multiple steps are required, then take the first such that the agent would become a step closer to the goal to activate a more specific rule.
T-R sequences, depending on how they are built, have certain properties such as completeness (the conditions cover all cases) and the regression property (conditions become more specific as the priority increases). Both these properties are necessary if you want to build universal T-R sequences that can reach a goal from any starting situation.
Figure 2: A teleo-reactive sequence of condition-action pairs.
Abstract & References
Here’s the abstract for the paper:
“A formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based. In addition to continuous feedback, T-R programs support parameter binding and recursion.
A primary difference between T-R programs and many other circuit-based systems is that the circuitry of T-R programs is more compact; it is constructed at run time and thus does not have to anticipate all the contingencies that might arise over all possible runs. In addition, T-R programs are intuitive and easy to write and are written in a form that is compatible with automatic planning and learning methods. We briefly describe some experimental applications of T-R programs in the control of simulated and actual mobile robots.”
You can download the paper from the JAIR website (PDF, 253 Kb):
Teleo-Reactive Programs for Agent Control Nils J. Nilsson Journal of Artificial Intelligence Research, 1994.
It’s interesting to note that Halo 2 uses this approach for building its behavior trees. It turns out to be very useful in many different levels of the tree. I call these dynamic selectors, as these selectors don’t just run once then terminate. Instead, during execution, they keep trying higher-priority child tasks to see if they are applicable instead.
In practice, this is very useful for dealing with events that open up new opportunities or threats. Here’s how it works:
Build high-priority behavior (e.g. attack) to deal with special cases (e.g. intruder alert). This would be a sequence starting with a condition.
Insert this behaviors with a higher-priority in a dynamic selector, so it gets rechecked on a regular basis.
When the pre-condition at the start of the sequence matches, the whole behavior starts, and the previously active behavior would be aborted.
Conceptually it’s quite simple, but there are hidden details that quickly get complicated, in particular in terms of control flow and the linear ordering of the behaviors. This can make it hard to implement a dynamic selector in a way that everyone agrees on the specification.
For this reason, I prefer to break down the role of a dynamic selector into simpler primitives where possible. This would involve using a cascade of static selectors, with each subtree including parallels to monitor its assumptions (both to keep the tree valid, and check for better opportunities, as I explained here). You can then use a build-time transformation to turn your dynamic selectors into this lower-level form of logic that’s more formally specified.
Screenshot 3: Test environment for TRP called Bot World.
Back to the paper itself, here’s how I think it ranks in practice:
- Applicability to games: 9/10
- Despite being invented in the context of robotics, this technology is applicable to games. Next-gen games in particular are increasingly dynamic, so anything that can help deal with these changes in an efficient way is more than welcome.
- Usefulness for character AI: 10/10
- The concepts behind T-R programs are not only applicable to game AI, but they’re pretty much a requirement. If your system can’t emulate these features, then why not borrow these ideas directly?
- Simplicity to implement: 9/10
- Certain aspects of the implementation described by Nilsson, like the dynamic binding of parameters within T-R programs, and runtime instantiation, may require a certain experience with meta-programming. However, generally, the concepts in this paper are very accessible and easy to insert into behavior trees at least.
The approach of teleo-reactive programs is important to understand as it provides great mechanisms for creating dynamically adaptive behaviors. It’s useful for dealing with high-priority events, bailing out of subtrees of low-importance, and refocusing on different behaviors that are more appropriate.
It’s also an useful way to structure goal-directed behaviors in a way that’s very responsive to dynamic changes. This approach can’t be used too frequently though as it may become inefficient to have all levels in the hierarchy behave as T-R sequences. So it’s important to develop a sense of where this pattern is necessary in a behavior tree, and using it accordingly.
Does your system support the concept of teleo-reactive programs? Any other thoughts on this technology?