Review
B000PS2XDO

The Crysis of Integrating Next-Gen Animation and AI

Alex J. Champandard on December 10, 2007

Next-gen games are proving to be quite a challenge to develop! From hearing Crytek’s Cevat Yerli talk at GDC in Lyon, it was clear that Crysis is based on some of the most advanced technology available in the games industry. Look no further than the animation system! But even that has its problems…

In particular for Crysis, Yerli wanted to make the characters feel much more real and connected to the world. To do this, it was necessary to reduce the amount of foot-sliding in the animations, which makes the characters less believable. These issues are so common in games we almost don’t notice them anymore!

That said, it’s possible to solve this problem with the latest animation technology, but working on the cutting edge is a challenge, particularly when integrating animation and AI. Even with fully working animation prototypes, it’s still a huge challenge to ship this kind of technology in game. As Yerli pointed out in his keynote:

“If there’s one area in Crysis where I think we failed it’s this one.” — Cevat Yerli, talking about problems with the advanced animation system.

In this article, I’ll take a detailed look at the kind of technology that’s used in CryEngine 2 for AI and animation, covering the basic concepts from my experience working with very similar systems at Rockstar. I’ll also go a bit further into juicy gossip and informed speculation to analyze exactly why it’s so difficult to get it working in game generally — not just for Crytek.

Motion Warping

Screenshot 1: Advanced motion warping in CryEngine2.

Dynamic Systemic AI

The design motto of Crysis, dating back to FarCry also, is that every battle should be unique, as Yerli emphasized in his keynote. The idea is to use large environments that invite variation, as well as an AI that offers diversity without necessarily relying randomness — rather like Halo’s AI does. (It uses the player’s behavior as the only source of uncertainty off which to make the actor behaviors less monotonous but still predictable.)

At Crytek, they call this dynamic systemic AI. It’s their version of the typical AI implementation in shooters these days, which lets the behaviors be a little more emergent rather than fully scripted. Here’s how you do this in practice:

  • Build an overall AI architecture with a sensory system.

  • Allow the logic to be customized modularly for each AI.

  • Use scripts to implement purposeful reactions to events.

  • Let the behavior emerge from the logic interacting with the world.

Of course, there’s a lot more to this, but it’s rather similar to other games with modern AI (see these technical reviews as a reference). Also, here’s an interview with Christopher Natsuume, producer of FarCry, which goes through the basics of how the AI works. These ideas were expanded for Crysis of course, but the game is still based on the same idea of customizable Lua scripts according to the CryEngine 2 specifications [1]:

“Allows complex AI behaviors to be created without requiring new C++ code, including extending state machine behaviors from LUA scripts.”

This kind of AI provides the basis of the emergent gameplay in the trademark “sandbox levels” of Crytek, but it’s also responsible indirectly for controlling the animation.

Scripted Smart Objects

Screenshot 2: Scripts used for smart objects in CryEngine 2.

Animation Dreams…

As Yerli mentioned in his keynote, Crytek was very keen to play with the “theatre of the player’s mind,” since ultimately, it’s the most powerful tool game developers have at their disposal. A large part of achieving this comes down to fooling the player into believing that every actor is part of the world, and it helps if there’s no noticeable foot-sliding (a.k.a. foot-skating).

This animation problem happens in two cases:

  • Moving the animation around in space procedurally, and

  • Doing a na├»ve blend by simply interpolating keyframes.

Technologically speaking, it’s possible to reach zero foot-skating as long as your animators clean up the original motion-capture animations to be perfect also. Then, you can get perfect blends by lining up and synchronizing animations while blending them, which introduces no extra foot sliding.

The theory behind this kind of animation synthesis is often referred to as parametric motions. This paper on motion graphs in particular is the source of most implementations in the games industry these days:

Motion Graphs
L. Kovar, M. Gleicher, and F. Pighin
Proceedings of ACM SIGGRAPH, 2002.
Download (PDF, 771 Kb)

Also, follow up by reading the paper on Parametric Motion Graphs to get an idea of how to blend animations together correctly. You can read about other improvements in the section on character animation at AiGameDev.com too if you’re interested in other derived approaches.

This is the kind of technology that I worked on at Rockstar for their internal middleware called R.A.G.E., which was the basis of R* Table Tennis [2] and soon GTA 4 [3]. Now for the last bit of (public) gossip, one of the animation programmers who worked on this within Rockstar subsequently moved to Crytek… so the rest of this analysis should be relatively accurate!

Parametric Motion

Screenshot 3: Parametric motions without foot-skate in CryEngine2.

Meanwhile, Back in the Real World!

Crytek effectively had this animation technology working fine as an isolated prototype, but in the game itself there were many problems. As Yerli mentioned in his talk at GDC Lyon, they just couldn’t get it into the game to the state where it met the requirements of the AI design.

I’ve worked on this animation technology before as well as the AI to control it, and we ran into similar problems (that game prototype was not released :-) Here’s how things went wrong step by step, and presumably how things happened at Crytek:

  1. Animation Team: “We can build this awesome animation technology that has no foot-skating.”

  2. Producer: “Sounds cool; let’s do it!”

  3. Behavior Team: “How’s that going to work for the AI?”

  4. Animation Team: “Don’t worry, we’ll retro-fit the new technology into the current API.”

  5. Behavior Team: “Ok, that’s great. The current system is pretty responsive!”

Did you spot the problem already? If not, here’s how the end of the story plays out.

  1. Producer: “So how’s this awesome technology coming along?”

  2. Animation Team: “Good, we’re about to integrate it into the game for the AI.”

  3. Behavior Team: “Hmm. The AI behaves completely differently now; the timing is off.”

  4. Animation Team: “That’s easy to fix, all we need is more motion capture data and a few fixes in the AI.”

  5. Producer: “No time for that; the game ships in a few months! Do whatever it takes to get the AI working.”

The problem, of course, is that by default your AI will behave much more sluggishly if you constrain it to what your motion capture can do. This is especially a problem if you don’t have many animations to provide better responsiveness. So in practice, your soldiers will rarely even have time to do anything intelligent before they get shot.

It’s certainly possible to make such technology responsive; Assassin’s Creed shows that (although there’s still room for improvement), but typically you don’t have the time and budget to capture all the animations required to get that responsiveness just for the AI characters. It takes hours of mocaps of different speeds, motion combinations, starting on different steps, etc.

On top of that, things become trickier when you develop the AI separately from the animation logic… (For player control, it’s much simpler.)

Squad Behaviors

Screenshot 4: Enemy soldiers require responsive movement.

Rethinking the AI / Animation Interface

Traditionally, the AI controls the animation via a virtual controller or some kind of moving carrot. The justification for doing this is that you can interchange players with AI and still use the same animation system. In FPS games where the player has no visible avatar, there’s still a division between animation management and the AI logic.

This type of API is certainly simple, but in the case of advanced animation systems, it just isn’t expressive enough to allow for responsive yet realistic motion. This is a form of behavior aliasing caused by a bottleneck of information in the code.

To fix this, you need to extend this interface significantly:

  • The AI needs a better idea of the animation logic so it can take into account all possible options from a logical perspective, and trade-off their cost/length.

  • The animation logic needs a to know the intentions of the AI so it can select the best motion clips ahead of time, as well as finding the most responsive option.

Not only is this a technical problem, but it’s also important to get the workflow right also. You can’t just abstract either system as a black box and expect everything to turn out O.K.

Motion Warping

Screenshot 5: Finding a balance between AI and animation.

The Solution for Responsive & Realistic AI

The fact is that both systems are highly constrained to what the other is doing. Sure you can fix this by extending the interface between the two, but ideally there’s a better solution:

  • Develop the animation system in the same way as AI is built, using goal-directed systems to figure out the best way to achieve an objective using animations.

  • Consider the animation as a lower-level of the AI system, and integrating it very closely with the AI rather than abstracting it away.

  • Prototype the AI bottom-up based on the real animations available, at least as much as there is top-down design.

  • Use multi-disciplinary sub-teams in AI/animation that work tightly together to resolve any problems immediately.

I have no doubt that Crytek will manage to get this right in their next game. Having made the mistake once, it’s easier to adapt and get it right the second time. I look forward to seeing the results, in the meantime be sure to check out Crysis! (See Amazon U.S. or U.K.)

Discussion 3 Comments

johnfredcee on December 11th, 2007

I've been at the sharp end of the AI/Animation divide: it's all about control: the AI engineers need a level of control that simply destroys believable motion: at it's worst it produces unplayable farragoes like Driv3r. The cheap way of doing this is to set up sets of animations suitable for blending together to produce actions: aiming up/down, run/walk/limp, and expose an interface for the AI people to control these actions via the blend weights, ala Unreal or Mechwarrior (or later incarnations of Driver). I've come to the conclusion that the next-gen needs something much better and settled on IK based on a technique called the Inverse Jacobian which has the good property of playing well with constraints and can be plugged into a physical system, as it's based on forward-estimation and can incorporate acceleration into it's processes: making this work with a physics engine is enough to keep a small team busy for months. No wonder studios house small armies these days.

alexjc on December 11th, 2007

[B]John[/B], Thanks for your great comment and welcome to the blog! I fully agree with you about control. In the [URL=http://aigamedev.com/animation]animation section[/URL] of the site I reviewed a bunch of papers, and there were a few that offered some really good interfaces to provide that control — in particular those types of parametric motion. Ultimately I think once we've worked out a good interface for the AI to control the animation as necessary (trying to find the best compromise between realism and responsiveness), then the remaining low-level animation synthesis should be designed to fulfill those constraints no matter what. I agree that I.K. is great for this, but in some cases it's perfectly acceptable (and necessary) to use motion capture data. The trick is dynamically figuring out which to use and when... Academics have a bunch of names for different kinds of interfaces that are sensible for this, including PFIK+F (which stands for per-frame I.K. with filtering.) So the idea is that you specify poses in time and space that are constraints, then let the animation run as an optimization to find the best way to deliver realistic motion. But yeah, as you said, it's not trivial! Alex

ojcme on December 12th, 2007

Do you think that behavioral animation systems will solve the problem of animation interfacing with AI or is that too far off to even be an option at this point?

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!