Open Article
mm_icon

The Structure of Action Game AI

Mikko Mononen on August 29, 2008

This article was submitted by Mikko Mononen, who was the Lead AI Programmer on Crysis. He currently working on yet-unannounced indy games for Secret Exit and Anyfun Games.

In game development, the team working on the AI must be one of the most multi-disciplinary of them all. The artist have to build the objects according to a specification so that the AI can use them, the animators create the most important aspect of non-player characters, the AI is often the most complicated part of the design problem, and the level designers and scripters must understand how the AI works in order to create challenging and robust levels. The programming tasks related to the AI are spread along multiple different specialized programmers too, from animation to tools programmers to gameplay programmers — not to mention the AI programmers themselves.

I could almost say that this dominates the whole design process. There are many things we do not consider to be "AI" (non-artificial and unintelligent) yet tightly coupled to the whole process of creating an AI for an action game such as first-person shooters.

Motivation

In this article, I will flesh out how this process can be structured in order to allow all the aforementioned people to work together to accomplish their goal: create an amazing action game. Traditionally the structure of an action game has been built by the programmers. The problem with such structure is that every single thing that is put into the game has to translated from one structure created by the designers to another which is done by the programmers. Often these structures have very little in common, so a lot is lost in the translation. The first casualties are the tiny nuances that are important from the design point of view, but may be complicated and feel unnecessary to the programmers.

“The first casualties are the tiny nuances that are important from the design point of view, but may be complicated and feel unnecessary to the programmers.”

This articles describes a structure of an action game that is loosely arranged in such fashion that the multidisciplinary team can work together. The interfaces between the different parts of the structures create a “contract” how the assets and design should be authored. These interfaces are also adapted into the technical structure of the system. This means that all the people from designers to artists to programmers are talking in the same terms.

I guess every producer agrees that good communication is the key to deliver a project on time. Good communication does not mean a lot of chatter but accurate messages. When left- and right-brained people are talking with each other, the same kind of translation process takes places as when the design ideas are being translated into something more concrete like code. Each of the parties forgive a lot of the tiny mistakes the other guys make as they speak and the group can be talking for hours without noticing that they actually talk about totally different parts of the problem!

It helps a lot to talk across this barrier when there is a good mental model both sides can relate to. It is even better if you can pin this mental model on the wall and point your finger at the area that you are talking about. It helps to arrange the thoughts so that they are later much more fluently representable.

The structure of the mental model enables solving and validating the design ideas from different point of views. For example a designer may have a “cool idea” about how to make the AI act in certain gameplay scenario. When he plants his new piece of design in the structure he is able to follow which other things may need to changed in order to get the feature in game. Maybe it is possible to fuse the idea into something existing or maybe this time around the new idea creates too many asset requirements and it needs to be scrapped.

Figure 1: Overview of the actions, contexts, settings, goals and scenes and how their relation ships.

The rest of the article describes how a structure of an action game can be laid out. The structure purely pragmatic, although it has strong parallels to some game design principles such as Chris Crawford’s taxonomy of interactive entertainment.

In a nutshell, the structure is as follows: the game world is build of tiny fragments of locations called contexts, the AI can perform actions and reactions within these contexts; the actions are active things that AI can do and the reactions are actions which respond to player actions or events in the game world; the contexts can be grouped into settings where the AI are behaving according to a goal; the goal is basically the driving force for all AI actions, and these goals can be grouped into a scene which describes how the goals change according to events in the game world; and finally a scene defines a whole section of a game level.

Contexts

At the very core of any action game are the scenes where the action happens. Each of these scenes consists of different locations in the level which have gameplay importance, like cover behind craters, cover next to a wall, or cover behind a window frame. We call these locations contexts.

When an artist is creating a context he needs to know what the AI is able to do in that location. The range of actions will create a contract between the artist, animators and gameplay programmers. This contract is basically the dimensions, shape and required structure of the object or context being created. For example some cover objects are required to be certain height and convex in order to reuse the animation assets and keep the runtime calculations low. The window frame may require a specific kind of frame and extra annotation so that the character can jump through the window.

The contract of context acts as abstraction of the context. Once the contract is made, the artist are free to create any kind of context and object the fit the spec. The beauty of the contract is that the animation and gameplay programmers can start working in parallel artists. Once all the assets are starting to take shape, more testing and iteration will naturally take place.

Figure 2: The Contract of Context defines the dimensions of the object. The measures may also include things like required clearance around the object.

Actions and Reactions

An action is anything that an AI can perform in a context. It could be shooting, taking cover, dying, melee punching the enemy or just standing still. That list might be enough extra information for the person who is going to model the object or location — the rest is dictated by the contract. The animators have further constraints how the animations are being tied together since their creative work is being connected into the more procedural runtime behavior made by the gameplay programmers.

The actions can be further divided into two categories, actions and reactions. The reactions are the actions that make the AI an interactive toy. They should be the very first thing that is designed and implemented, not pathfinding or any other nice and challenging piece of technology.

“The reactions turn the AI into an interactive toy. They should be the very first thing designed and implemented.”

There should be easily understood reaction for every single action the player can perform and there should be easily understood reaction for every single event that happens in the game world. If the player is able to melee, the AI must respond to this in every single context. If there is a physicalized box which may collide with the AI there should be meaningful reaction to that in every single context.

The more proactive things the AI may perform on the context are called actions. This involves shooting, throwing grenades, or taking cover. These are controlled using a bit higher level behavior which tries to fulfill a goal. (More about goals in the next section.)

From programmer's point of view, the actions and reactions could create a tiny state machine, or it could be simple decision tree too. The important fact is that we are not trying to create a state machine or decision tree which is able to handle every single scenario in the game, but we just want to create the most robust solution for the AI in this particular context performing the finite number of actions and responding to finite number of events.

In some cases it may be a good solution to hardcode the state machines and expose them as parametrized ready to use blocks for certain types of contexts. For example every convex cover object could be handled using the same piece of code, while the assets and object dimensions might change.

Figure 3: The Contract of Action defines the actions and transitions between the actions. The agent may have different staten when it is using the context. For example ladder requires transition states for entering and exiting the ladder as the state to use the ladder.

How to Design Actions…

The goal of designing actions and reaction for a context is to create a robust agent which can react to any event that is possible in the game world. The contexts and actions should make both the designers and programmers aware of all the possible events you need to handle.

Once you have a list of actions and reactions you need to perform it is far easier to come up with interesting ideas rather than just trying to think how an AI should behave. How should AI shoot or take cover when climbing a ladder? Or how should the AI melee or throw grenades when driving a boat?

Behaviors are side effect of sequences of events which are required to fulfill a goal. Just like working on animations for a context may require to change the dimensions, later working on goals might require more actions to be added. It might be O.K. to omit certain actions — like throwing grenades — if it does not make any sense within the context, but the AI need to be able to perform the reactions on any context since they are external events.

Unfortunately AI systems cannot yet be as elegantly solved as physics simulations where same rules can be applied to a wide range of input data. Most AI work is about creating special case code or assets. The more we try to reach the realism the more special cases there are to fill in. Currently the best tool is to have carefully designed game world. Everything in the game world better have a good purpose, since we are going to need to create special reactions for them.

“Most AI work is about creating special case code or assets. The more we try to reach the realism the more special cases there are to fill in.”

There is no definite order in which the creation of contexts and actions should happen. It is just important to know how they affect each other. Every new location will require new kind of actions and creations which suit the context. Ever new action or reaction may require special assets for each context.h

One way to visualize the asset requirements is to create a matrix of contexts and actions plus reactions. If the AI is able to have distinct states, like suspicious and alerted, some actions and reactions may require more assets. The matrix is a good tool to estimate how much work a new player verbs or new contexts in the game will generate. When the contexts, player actions and AI actions are defines the matrix will give quick answers to questions like “how many animations do we need for character X?”.

The second contract is the Contract of Actions which in turn depends on contract of context. Contract of actions is basically contract between the game logic and its representation. The contract defines the actions and reactions, how they are connected to each other within the given context. It also defines how the transitions between different contexts are interfaced.

For example when an AI wants to change a cover, it will first be in one cover context, then on flat ground context and finally on another cover context. In this case the state of the AI that is used as an interface between different context could be standing on flat ground.

One way to look at the reactions is that they always represent a change in the state of the AI. Another one is that the reactions try to encode a common ruleset which always needs to happen regardless of the higher level goal. If the AI must always take cover when being hit, this can be implemented as a reaction, but if the character should react to being hit differently depending on the situation — say a character running away scared and not reacting to nearby whizzing bullets, versus a character having a smoke and waiting for the action to begin — then the decision should be done on goal level.

Goals

The actions and contexts create a patch of possible actions in the scene. Some actions happen around objects, some cover larger ground like navigating on a flat surface or navigating on stairs. The interfaces between different contexts create a structure of connected things that the AI can do. For example a barrel cover object may connect to a flat ground area, or a fence context which includes a jump action may connect two flat ground areas.

This creates a nice structure where different kind of behaviors can be executed. But instead of of designing behaviors, we should design goals. A behavior is something that the AI does in order to fulfill a goal. The difference may sound nitpicking but it is really important. It forces the designers as well as programmers to define the goal state of the AI. Designing behaviors to fulfill goals usually channels the energy towards a common solution and avoids the passive behaviors.

The level designers layout the structure of the objects and locations. For example the structure and actions for a building are different than the structure and actions for a combat in a forest.

The building can be described as a set of doors, rooms, windows and choke points. Using this structure it is possible to create a sweep behavior where the goal is to check every room and storm through the doors. Choke points are special areas like stairways where the normal navmesh type navigation would have problems. The design for the forest could require that for cover objects there must be convex and that the area should be continuous. This allows the programmer to create nice and robust procedural flanking behaviors with great designer control.

Figure 4: The Contract of Setting defines the structure of the environment. In above example the structure required for the forest is set of convex covers in unobstructed environment. Since the fence is creating an obstruction additional actions are added to allow the AI to jump over the fence. This restriction allows to build simple group maneuvers since the group will not get stuck in local minima on the graph which connects the cover locations.

The structure of each location is defined in Contract of Setting. It is a set of rules and guidelines how different environments should be built so that the procedural algorithms used by goals can have well formed data to work with. It is likely that certain set of goals share the same contract. Ideally each level would be build out of well defined patches, which nicely overlap and blend to each other.

It may be desirable to build behaviors for groups instead of individuals. Even the per context actions can have certain controls which depend on number of characters in the area or number of characters per group. Considering the whole group instead of individuals help to solve cases like the stairway choke points mentioned above. In that scenario, instead of trying to solve the movement to a cover one character at a time — creating chaos and tough time for the crowd control algorithm — the whole group could slide towards more covered area like pearls on a string.

The same robustness principle that is applies to actions within a context should be applied to goals too. Each of the goals should define a 30 seconds of fun, where reactions around object and locations should create 5 seconds of meaningful interaction. Instead of trying to build behaviors which will try to handle every single scenario in the game, we should strive for creating behaviors which fulfill just a handful of goals — preferably just one — in very well defined setting.

For human sized characters the dimensions of a context usually varies from a surrounding of a barrel to the area next to a car. For similar sized character the area governed by one goal should be somewhere between a tennis field and soccer field depending on the environment.

Scene

Whereas goal consists of group of contexts, a scene consists of group of goals. The goals should be selected to create certain type of gameplay or to create certain mood for the scene. The areas of the goals can overlap and the current goal can change based on some higher level logic.

For example the goal for group of characters can be different depending how the player assaults their camp. There could be one goal for the case where the player is sniping the camp from long distance, one goal for the case that the player assaults the camp directly on foot and one more for the case that the player approaches the camp with a vehicle.

Some of these deciding factors are based on the story too. In systems where more general and all-purpose system is trying to make the AI look good on all scenarios this level of control often leads to scripted scenes and on-the-rail AI. With the proposed structure it is possible to create a well defined setting with well defined goal where an AI driven vehicle is running away from an enemy helicopter and put that as the goal of that vehicle. The result should be emergent behavior where the player is also able to intervene with the situation and maybe assist the vehicle to survive.

Since the scene logic is controlling segments of 30 seconds gameplay it can and should be fairly simple. For example if the lowest level actions could be controlled using finite state machines, the middle level goal solving behaviors could use planning and then the highest level dynamic scene control could use simple decision tree. There are many good ways to solve each of these control problems. For example behavior trees may be good choice for the highest level logic too. In order to get best results the system used to select he different bits should never be so complex that the designers who are using it cannot understand it. AI should not be a magic black box.

Figure 5: The scene describes a set of overlapping goals and logic which allows group of AI to switch between the goals. In the above example, the goal for the AI could be to flank the player if the player is in the forest (A), charge towards the player if the player is on open field (B), or storm the building if the player is inside it (C).

The scene logic can of course react to the player actions more often than every 30 seconds, but since the goals take time to accomplish it makes sense not to change them too often. The usual anti behavior-aliasing tools like hysteresis for temporal and spatial rules are good candidates to help smooth things out.

Following the previous analogy, the dimensions of each scene probable range from the size of a soccer field to the size of a mall. A whole level will consists of several of these scenes. The higher we go up in the control hierarchy, the more rigid the structure is going to be. The structure of the scenes could be linear or in more sandbox like games you could visit each scene in any order you wish which creates the illusion of a consistent and continuous world even if we technically only keep updating the scene the player is actively interacting with.

Some events can be shared between scenes too. Just like we could modify the goals of the scene based on how the player is interacting with it, we could as easily change the goals based on some events from previous scenes. Failing to destroy a helicopter in scene 1 could send it out to scene 2 and they would be alarmed and reinforced — instead of sorting bananas — when the player arrives there.

Conclusion

The important concepts are contexts, actions, goals, behaviors, and scenes and the contracts that are created around each of the concepts. The concepts should be simple enough that everyone on the team can understand the high level idea and how it affects his or hers work.

The structure should allow designers to validate and estimate their ideas, how much work will a new feature generate or what kind of things need to be designed to put one idea in. Further the structure allows natural places to inject the story.

If we were to continue the list, the concept above the scene would be the story of the game. A bit weaker pieces of that story is also told how a scene is setup — does the AI retreat or attack and in which kind of environment — whilst the actions and reactions do not tell that much story at all anymore. The higher up we are in the hierarchy the more close to the rigid story structure we are and the closer to the bottom of the structure the closer to the interactive game rules we are.

The hierarchical structure also creates natural places to put incremental testing phases. Once the animator has imported his data into the game editor and verified that it still looks nice with all that quaternion compression he should be able to test his new assets within the context and manually tweak the settings to see that the asset behaves nicely with the extremes, and finally check how the more procedural code uses his new asset. At simplest it could be piece of code which randomly selects valid transitions to other actions and reactions. Each goal solving behavior should be testable separately too, as well as the level designer should be able to reset his current scene to certain state and see if it plays out well.

Especially when working with large team it is essential that everyone is able to test their own work. Different teams may have different schedules. It may be weeks away to get that new animation which is so desperately needed to make the new vehicle combat look perfect and it would be shame to wait even more for another round for tweaks and fixes when the animation does not blend together with the existing assets once it is put in the game.

When the structure of the game is split into nicely defined chunks on different levels, it enables to the people to work more efficiently. The contracts allow the people to work in parallel. Once the contracts have matured a bit, the art team can crank out new cool variations for the already existing library actions.

I represented a hierarchical action game structure where the motivation for the structure is practical–how to chop things down into manageable pieces so that people from many disciplines can work in parallel and understand the game creation process. The structure is highly dynamic, allowing changes in all the levels of detail from the game flow from mission control to minute to minute AI decisions.

Discussion 5 Comments

alexjc on August 29th, 2008

Nice post Mikko. I really enjoyed editing that! There were many more little nuggets in there that I wanted to highlight: [quote]There should be easily understood reaction for every single action the player can perform and there should be easily understood reaction for every single event that happens in the game world.[/quote] [quote]The important fact is that we are not trying to create a state machine or decision tree which is able to handle every single scenario in the game, but we just want to create the most robust solution for the AI in this particular context performing the finite number of actions and responding to finite number of events.[/quote][quote]One way to visualize the asset requirements is to create a matrix of contexts and actions plus reactions. If the AI is able to have distinct states, like suspicious and alerted, some actions and reactions may require more assets. The matrix is a good tool to estimate how much work a new player verbs or new contexts in the game will generate.[/quote][quote]Designing behaviors to fulfill goals usually channels the energy towards a common solution and avoids the passive behaviors.[/quote]If only all levels were designed with all this in mind, AI would actually be much easier to implement! Alex

MieszkoZ on September 1st, 2008

An excellent read, Mikko! Some very good points made. This kind of nicely laid out structure (and pictures!) really help designers and artists get the grip of the whole picture. I'll introduce those concepts to my team right away! Also, the way you described action/reaction part suggests you build AI system around [I]smart objects[/I] or [I]smart terrain[/I] (pick the name you like) which is my favorite way of handling AI actions execution - just pick a spot, and the 'spot' will tell AI actor what animation to play (a little simplified, but that's the general idea ;) ). Anyway, thanks for this art, Mikko, and I hope to see more!

memoni on September 2nd, 2008

Thanks for the kind words, maybe I'll perster you with another article in the future ;) ak-73, I tried to pick a coherent set of features. I think I used the term event it pretty much the usual way, so I did not want to hilight it. The agent (I think I used the word AI throughout the article) is important too, but my definition of it is pretty much the usual. MieszkoZ, I think there are some common things with the smart-object/terrain stuff too. I think my contribution was that you should not just build objects which have just some actions, but instead you create contexts (smart objects) where the agent is required to perform certain set of reactions and may perform certain actions, further, these contexts are grouped together to create a carpet of continuous actions (maybe this is the smart terrain thing?) which is required to have certain structure too. So far all the smart-object articles I've seen pitch the idea so that the smart-objects shout out what they can do, where as in my model the goal/behavior tells what to do and the reactions and actions in a context describe how. Those actions also include actions which are required as transitions between the contexts based on the structure of the scene (say some navigational things like using ladders or jumping down a ledge). The difference may sound irrelevant, but these are the important nuances that may be overlooked when a design idea is translated into code. Some of the ideas for the setting and scene stuff came from Damian Isla's GDC talk: Building a Better Battle. So if you are interested on the topic, you should check it out. [url]http://www.bungie.net/images/Inside/publications/presentations/betterbattle.zip[/url] The audio is also available at GDCradio.com (search for GDC08_400).

MieszkoZ on September 2nd, 2008

[QUOTE=memoni;4609]So far all the smart-object articles I've seen pitch the idea so that the smart-objects shout out what they can do(...)[/QUOTE] That's true. But I'm a big fan of the idea of encapsulating AI instruction sequences (like: stand, play anim, duck) in a flexible form of (smart) objects that can be added later on without a change of single line of code. And those objects doesn't have to broadcast information on what needs they can satisfy (so maybe I should call them 'mute smart objects' ;) ). Let's say you have 'covers' implemented as smart objects. You (or a special AI subsystem) can then say "go to that cover, near position (x,y,z)", and the AI actor will go there, and cover in whatever way object tells him to. This way at any point in development you can add new, special types of covers (like: car cover, tree cover, high grass cover, open field cover, etc.) with absolutely no additional implementation cost on cover selection code side. I guess what I call smart objects you call contexts (which reminds me: smart object - in my view - doesn't have to be 'object' per se, it can be an area or volume as well). I think I'll be using 'context' from now on, since it's more precise :) [QUOTE=memoni;4609]Some of the ideas for the setting and scene stuff came from Damian Isla's GDC talk: Building a Better Battle. So if you are interested on the topic, you should check it out. [url]http://www.bungie.net/images/Inside/publications/presentations/betterbattle.zip[/url] The audio is also available at GDCradio.com (search for GDC08_400).[/QUOTE] A really nice resource, thanks!

archanger on September 2nd, 2008

As the level designer I like the idea of smart objects, or objects of that meaning. I have one project in design stage and I already studied several options for AI behaviors within scene. I also divided scenes into several groups (contexts based, something like this), those groups are connected as mentioned by navigational things: like doors, ladders, some ramps or stairs. For me will be important action and goal for player, for NPCs. Behavior would fill the space action and goal and other less important things. While this is in early stage of designing AI I’m quite satisfied what I learned on AIgamedev.com for past few months, and it wasn’t time wasting. :) Thanks Mikko, thanks Alex, thanks other guys, :) I’ll keep coming back.

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!