Learning to Move Autonomously in a Hostile World

Alex J. Champandard on November 15, 2007

This week’s Thursday Theory post on looks into applying reinforcement learning to bridge the gap between animation control and high-level AI logic. Specifically, this review covers autonomous characters that learn to move in a dynamic world, as developed by Leslie Ikemoto from the University of Berkeley.


AI logic is mostly built top-down. The designers build behaviors by recursive decomposition, and at the lowest-level animations are put in place to achieve the desired effect. Often, the AI gets absolute control over the animation using a procedural mover to reposition the skeleton in space, so the results don’t look too great.

When you look at the problem bottom-up based on the animations available and how they can be blended together at high-quality, the potential for realistic behaviors is much higher. However, the problem is very different; at any stage while playing an animation:

  1. How do you pick the next animation according to high-level goals?

  2. In what way should the AI logic be structured to facilitate this?

The paper applies reinforcement learning to this problems. It’s particularly useful to find out if the results are suitable for commercial game AI.

Two Autonomous Characters

View or download the video (MOV, 153 Mb)


There are three ideas at the core of this paper, which allow it to generate realistic animations that are goal driven:

  1. Use a motion graph at the base, then use a value function to decide which transitions to take.

  2. Apply reinforcement learning to approximate the value function automatically from high-level goals.

  3. Rely on a global planner to compute paths dynamically in the world.

The state space can grow to be very big, however. This is resolved using:

  • A simple reward function setup with weights that punishes collisions with obstacles or being seen by enemies, and rewards following a dynamically chosen path.

  • A model of the actor’s state, which is based on a discretized grid around the agent. Each of the bin stores a vector of information about obstacles and enemies.

  • A stochastic optimization technique is used to compute a policy that selects the right animation based on the reward signal.

By modeling the actor’s state locally and approximating the whole state space, reinforcement learning becomes feasible. Thanks to the stochastic reinforcement learning algorithm, the reward function can be approximated with sufficient accuracy to drive the animations in a goal-directed fashion.

Abstract and References

Here’s the abstract of the paper:

This paper describes a framework for controlling autonomous agents. We estimate an optimal controller using a novel reinforcement learning method based on stochastic optimization. The agent’s skeletal configurations are taken from a motion graph which contains seamless transitions, guaranteeing smooth, natural-looking motion.

The controller learns a parametric value function for choosing transitions at the branch points in the motion graph. Since this query can be completed quickly, synthesis is performed online in real-time. We couple the local controller with a global path planner to create a system which produces realistic motion even in a rapidly changing environment.

You can download the paper from the site (PDF, 10.4 Mb).

Learning to Move Autonomously in a Hostile World
L. Ikemoto, O. Arikan, D. Forsyth
SIGGRAPH 2005 Technical Sketch
Dynamic Obstacles

Screenshot 2: Moving through dynamic obstacles.

Evaluation & Discussion

While this project seems at an early stage, it’s robust enough to apply to games already and points in an interesting direction of research.

Applicability to games: 5/10
As it is, this system can be applied to games with animation graphs, where the world representation can be kept simple. It’s likely it won’t scale very well, but it certainly points towards fertile ground: combining stochastic planners with reinforcement learning to find the best balance of memory and computation.
Usefulness for character AI: 6/10
This paper is interesting because it shows how to build the high-level AI logic automatically based on the low-level animations, simply using a reward function. Such techniques are ideal for generating extremely realistic motion with some high-level guidance, if not completely purposeful AI. The paper mentions some interesting strategies for modeling the state of actors locally using bins of with vector states in a grid; this is a useful idea for any sensory system.
Simplicity to implement: 7/10
The algorithm does not rely on any specific motion-graph technology, so almost any implementation will do. Of course, the better the graph, the more realistic the motion. As for the reinforcement learning algorithm, it is based on random sampling so it’s not particularly hard to understand or computationally expensive (at the cost of accuracy).

What do you think about the applicability of this AI algorithm to games, and artificial intelligence in particular?

Discussion 1 Comments

Dave Mark on November 16th, 2007

Wow. Excellent video. I'm looking forward to having the time to look through the actual paper. (I'm starting to plan reading material for my Thanksgiving weekend flights and the inevitable downtime.) I would like to think that we are going to soon be at the point where this sort of approach can be used on a larger scale.

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!