Directing Traffic in the Dungeon: Communicating Intentions To Party AI

Dave Mark on June 10, 2008

This week in his column Dave Mark ponders about tricks we can use to share our thoughts with AI controlled characters. Join in the developer discussion and let him know how you think we can solve this problem.

Hey you! Keep it movin’!

“Thank you for entering the large orc dining hall. Please listen carefully as our menu has changed.

If you would like to attack the orcs in this room, press 1. If you would like to retreat, press 2. If you would like to –

If you would like your have the thief hide in shadows, press 1. If you –

If you would like the thief to hide in the shadows on the left side of the room, press 1. If you would like the thief to hide in the shadows on the right side of the room, press 2. If you would like the thief to hide in the shadows behind the large chairs on the other side of the banquet table, press 3. If you would like to hear these options ag –

Your thief will hide on the left side. If you would like him to assassinate the orc leader, press 1. If y –

Your thief has now been instructed. Whether or not he can be trusted remains to be seen.

If you would like your wizard to cast offensive spells, press 1. If you would like him to case buffs on you, press 2. If you woul –

If you would like your wizard’s offensive spells to be area of effect in nature, press 1. If you will be wading into the fray and would like your wizard to show some restraint for a change, press 2. –

Your wizard scowls at you but nods that he understands.

If you would like your paladin to attack the first orc from the left, press 1. If you would like your paladin to attack the second orc from the left, press 2. If you would like your paladin to attack the third orc from th –

You have entered a number that is not recognized, please count the orcs again. Remembering to start from the left. If you would like to hear this menu again –

You have entered a number that is not recognized, please count the orcs again. Remembering to start from the left. If you would like to hear thi –

You have entered a number that is not recognized, please count the –

We’re sorry, but you have died. Perhaps next time you should spend more time fighting and less time trying to communicate your intentions on to your party.

Thank you for using our Advanced AI Direction Interface. Good bye.”

Wow. Technology sure is nice and all… but I sure would like to talk to a human once in a while.

The Problem with Sidekicks

For those of you that read last week’s column, “The Art of AI Sidekicks: Making Sure Robin Doesn’t Suck”, you may have seen the flurry of comments that came in. Right out of the gate, Kevin Dill (Rockstar New England) touched on something that set the direction for many of the remaining submissions. He said:

“It strikes me that if you want to do this well, then you need to crack the hard problem of discerning the player’s intent.”

This is, of course, a particularly difficult issue to even approach from a design standpoint much less solve from an algorithmic one. Setting aside the issue of trying to actually have the AI ascertain intent, let’s address the former.

What the heck does he mean?

In real life, militaries and other organized groups of people throughout thousands of years have developed ways of communicating intent. Obviously, there is the spoken word as the most direct and usually concise way of expressing something. There are other methods, however, such as hand signals. If you have seen military movies such as Saving Private Ryan, you may have seen some of these in action. (Apparently the hand signals in Saving Private Ryan and Band of Brothers came from the same military consultant, but are inaccurate since they were not in use during WWII.) (I particularly like this parody of the above hand signals.) The bottom line is, real humans have no shortage of ways to communicate either personal intent or orders to others.

As game players, we are often on the receiving end of these. Our allies will use the spoken word or even point in directions or at objects. With a little help from the UI on occasion (glowing objects or paths or on-screen directional arrows seem to cover many of the methods), we can usually ascertain what it is that we are supposed to do.

Likewise, many discussions have been had and great care has been taken to solidify a now-accepted premise in game development – “Communicate the AI’s intent to the player.” If an AI agent will flank, let the player know. If an AI agent will retreat, let the player know. If an AI agent is panicking, let the player know that it is panic and not your AI skitzing out. Anyway, you get the picture about how we are meant to get the picture.

Receiving but not Sending

Did he just dis’ me?

The problem comes when the roles are reversed. When we want to communicate our intent, either through speech or through action, our hands are, quite literally, tied. We can’t speak to our allies, we can’t do fancy hand signals, we can’t point, we can’t even make eye contact. That means that the AI agent’s inference about our intentions has to be entirely situational and/or entirely reactionary. It has to think about what it would do in the situation (something that we haven’t quite perfected anyway), or it has to simply wait for us to do something first and tag along like a little brother. So there is now a major block in one of the fundamental aspects of teamwork. We are cast in the role of a mute, paralyzed leader without the ability to communicate to the very people who are meant to follow our every order.

This barrier used to exist in online co-op play as well. However, that was somewhat mitigated by the invention, technical feasibility and immediate popularization of voice connections. While we may not be able to do all the pretty hand signals, those were only necessary in times where distance or the proximity of enemies prevented shouting. Given the fact that you are quite possibly speaking to someone sitting 3000 miles away, proximity isn’t an issue. And one would hope that the enemy isn’t tapped into your same voice channel in-game, so there goes the second concern — silent stealth is not an issue. In shooter games, there still is a lack of the ability to point at something, but a little extra verbosity solves that problem. But, short of a seriously robust voice recognition and NLP package, we aren’t going to be talking to our AI pals that much anytime soon.

“Short of a seriously robust voice recognition and Natural Language Processing package, we aren’t going to be talking to our AI pals that much anytime soon.”

In RTS/TBS games, we have been able to do the pointing thing for some time. By “pinging the map”, we can draw our partner’s attention to a particular spot. Since, by their very nature, you are doing a lot of map-pointing in those types of games, this is easily done without breaking the flow of your actions. However, describing what it is you want to have happen at the identified point is not always obvious. In co-op play, we can use the chat box, etc. However, while our AI ally will very easily be able to determine the location of the ping, he won’t necessarily know what to do about it unless we encode more information.

Again, that is easily done in the realm of the keyboard-driven RTS. But is it something that we can then map over back into the FPS/RPG world — especially on consoles? With enough buttons and sticks to press and wiggle, we should be able to at least put together a rudimentary vocabulary of commands and intents. Point your gun at a barrel and say “Hide behind that“. Point your view at an enemy and say “Dude is waaay too scary for me… you take him.” While it looks OK on paper, in a firefight, the last thing I want to do is point away from my target in order to tell my cohort to “go around to the left“. Maybe slipping a thumb down onto the D-pad to convey something as simple as advance, retreat, flank left or flank right is doable… but anything more than that and you might as well put a whistle in my mouth and a funny hat on my head.

I suppose another method of input would be something along the lines of gesture recognition. Back in the days of Black & White, you could do some nifty stuff by drawing right patterns on the screen. It might be quite feasible, and even somewhat realistic, to transform that into a way of gesticulating. Of course, doing Black & White style gestures might be a bit more tricky on the console controlers. Even so, regardless of the platform, we are usually armed with only one “arm”, if you will — only one direction of pointing. If I’m looking down my sights at someone, I would like to keep that view while waving to my posse. That being said, the Wii-mote would be delightful for this purpose.

Casting spells via gesture recognition in “Black & White”

Screen 1: Casting spells via gesture recognition in Lionhead’s “Black & White”

Beyond Point and Shoot

Of course, much of the above discussion is more about design decisions and interface assignments. What about the AI? Going back to Kevin’s comment — how do we get the AI ally to understand what it is that we are trying to accomplish? When creating enemy AI or only loosely associated allies, our progeny are largely on their own. If they do something, it is because they decided to do it on their own, using their own little brains. If it happens to be stupid, so be it. Everyone makes mistakes, right? Even if the other allied soldiers go charging ahead when you are feeling like a chicken, it serves to make you feel more like a chicken. If they hang back while you go barreling into the fray, you assess your own bravery by the yardstick set by your relatively immobile squad-mates. However, if you have someone with whom you are supposed to be working in some sort of quasi-symbiotic relationship, you really expect that you are working together — not just in parallel.

Some gestures are meant as a warning to other squad-mates. Often, the warning comes too late.

Part of the process would be identifying what particular inputs one would use as part of that algorithm. Surely, much of it is game dependant, but there would be some typical behaviors. Those would not necessarily be without confusion, however. For example, if I creep up to a corner and peer around it, what am I going to do next? Am I planning on surprising the enemies with a pop-out and rush attack, am I planning to blind fire around the corner, or am I simply verging on wetting myself? Maybe I’m just observing and formulating an approach route for me and my partner… which I won’t be able to communicate to him anyway.

I suppose one could throw in one of the staples of our AI wish lists at this point: “Can this be done with learning?” Being completely idealistic about things, we could say that a learning algorithm tied to our actions would begin to figure out what we meant when we did certain things. On the other hand, speaking for myself as a game player, there are plenty of times when I’m not sure what it is I’m even going to do next. I can’t imagine a learning algorithm that would easily sift through my fits and starts and be any better off than when we started.

So, I guess this question is a little more abstract than I intended this week. What it boils down to, however, is this: Are we entirely dependent on the interface for our allied communication? And, if so, what are the potential solutions? To be honest, if we don’t come up with any, allied AI agents are forever going to be upstaged by online co-op play.

Discussion 7 Comments

Ian Morrison on June 11th, 2008

In your typical FPS, the direction of a player's view and the actions they're taking only give you information on very low level player actions. If a player is staring at something, it tells you only that he's very interested in something in that direction. If he's shooting, you know he has a target and is engaged. Outside that, though, you only have history to work off of. If the player is moving from cover to cover in a certain direction, you can usually have a good guess of where he's trying to go. Even then, though, you don't know if he's running, attacking, flanking, or just picking a spot at random. In short, as you've summed up, there aren't a lot of ways to communicate intent as it stands. Here's a thought: commandeer the "use" button on your typical FPS and give it a context sensitive nature for when you aren't pushing buttons or opening doors. Alternately, have an entirely different button for NPC interaction that does the same job. From there, you have a couple more direct, intentional ways to interact with NPCs: *Point at something. Look at it and press the button, and the AIs can be certain that it's a point of interest without having to guess. If they can't figure out what the player wants them to do, then they can lapse into default behaviours... keeping an eye on that area, doing a quick search, and letting the player know that they don't understand. When they do understand, though, you can communicate things like "I want this button pushed" and "that enemy over there is interesting somehow!". *Click and drag. Specifically, look at one object, hold the button down, and move your view to something else for some context sensitive action that combines two objects or locations. This could be something like "this squad member should go here", "this key needs to be put in this door", or "move over, dummy." This is a bit of a "draw on chalkboard" style of communication. Taking it further, you could add more things to the interaction. If the pointing thing is being done with the use button, you could have left mouse button push/point emphatically while the right mouse button grabs or does a very general gesture of some kind. I think what I'm longwindedly getting at here is that we need to give the players more tools for interacting socially. They've got the tools to interact with the environment, which doesn't take much (in Half Life, you're essentially a set of eyeballs with a terrifyingly large arsenal), but we haven't given them ways to talk to people, communicate intent, or otherwise interact with virtual human beings. The player can DO things, but he cannot SAY things. Until we do, any attempts to have AI guess the player's intent will be just that: guesswork. Without (and even with) things like powerful and robust voice recognition it'll still be very difficult to figure out exactly what the player wants, but we should still be trying to find elegant and intuitive ways to let the player "say" things. I mean, hell, haven't you ever had a game where you just wish you could give the one fingered salute to that unbeatable boss character? :P

Dave Mark on June 11th, 2008

Wow. Hey Alex... Kevin is bucking for his own column around here! :-)

Andrew on June 11th, 2008

It was said by Peter Molyneux that Fable 2 specifically did a dog because human intelligence modelling is wayyy off. In any case, for squad based games, you can assume some structural limits to the world. I loved Republic Commando's simple command system - you led a squad of 3 other commandos, and had these methods of communication: [list] [*]Squad Orders (set with function keys F1 - F4). Applies to the entire squad: [list] [*]Search and Destroy - Squad went ahead of the leader, attacking everything. Useful mainly in cleanup. [*]Form up - Follows the player. [*]Secure Area - Sticks the squad to a specific location (where the player is facing currently) allowing for the squad to be sent somewhere. [*]Cancel Maneuvers - If they were doing something individually (like sniping, etc.) it will cancel it and return those members to the main group. The AI knew when you were to far away for it to be of use (in most situations) so calling them repeatedly usually wasn't necessary. [/list] [*]Individual orders (pretty sure this was "F" or "Use"), were assigned to the nearest squad member, where they would detach from the main group to do them. These were context sensitive "outlined in blue" actions the squad members could perform, ie: [list] [*]Concentrate on Enemy - Press F on a enemy, and all the squad tries to kill it (while carrying on other orders like those below). [*]Sniping point (uses sniper rifle), Anti-armour (uses anti-armour weapon), Grenades (uses grenades). [*]Door breaching (all 3 squad members mount up for this, and throw grenades first), like the SWAT teams might do. [*]Hacking / Blowing stuff up - Objectives mainly. Usually a tense minute of hacking while being assaulted by enemies. [*]Heal teammate (or if you are down, heal me!), who is knocked down. [/list] [/list] All the individual actions could be performed by the player too, which made great parity in actions available. You could be a coward at the back (although at times your individual use of any weapon at any time was a good idea to take advantage of), but working as a team was much easier. The amount of context sensitive locations was enough to make it seem right too. It grew on me towards the end of the game. Really works well, because it was well designed encounters (yes, entirely linear, and not too tactical, but there we go). Well polished with the AI not getting confused. Helpfully you didn't die when the player died - this, unlike most FPS games (actually, 99% of them), allowed the player to take the forefront, and if taken down, still get back up to fight. Rainbow 6 Las Vegas has a few similar things, but doesn't have the same style of objective-based gameplay, and is mainly run-and-gun. Surprisingly more complicated to control the squad though. Brothers in Arms does have a nice tactical view (although it limited the actions to mainly "Take cover there", "Shoot them gits", "Assault that place by running into the enemy (and probably die)" and "Follow me" to keep it simple, and squad members basically could just shoot - they rarely used grenades). Anyway, this is my experience of a well tuned one, a system I'd like to see - squad commands and individual commands. Allow parity between player and NPC actions, and make it clear what the AI is doing at all times with regards to player actions (Republic Commando had a lot of voice lines for each squad member regarding confirmation on actions). What might be a problem is complicating the system - weapon loadouts, "Stealth", complicated terrain (in the form of ladders, jumping puzzles, crawling, or corridors which only one person can fit down, or worse; mined areas that NPC's die on if they pass over), and making it really easy to die (or having the player and NPC allies have crap aim all the time...stupid Gears of War...). Add one more element and you'd need a minimum of one toggle button or menu option for it, and that gets dead hard dead fast!

Dave Mark on June 11th, 2008

[quote=Andrew;2897]It was said by Peter Molyneux that Fable 2 specifically did a dog because human intelligence modelling is wayyy off.[/quote] Unless that's Peter's [URL=""]Significant Scientific Achievement[/URL]? [B]@Kevin and Xavier:[/B] I like both of those interface ideas for AI use. Since we can't explain to the AI "the one with the pointy beard and funky looking eyes" we can break the containment of realism and use a tagging system. Also, to both of you, especially Kevin, remember you can use a forum account to do your replies. It makes it easier for others to reply to you using forum tools as well.

Kevin Dill on June 11th, 2008

@Dave: So noted. @Xavier: Yes, those marks are tremendously useful for coordinating the actions of 25 (or even 5) people! They're a little clunky to use, and don't have any implicit meaning at this point (though conventions have developed around some of them), but definitely a starting point. Of course, you can only put them over the heads of PCs and NPCs. I can imagine wanting to be able to mark other things (such as positions). There are also little smoke pots that you can throw around to mark position, and we've played with them some, but the problem with them is that the range is too short. Still - an interesting approach here might be to try to find out what the successful raiding guilds do, and build from there.

Andrew on June 12th, 2008

[QUOTE=Dave Mark;2902]Unless that's Peter's [URL=""]Significant Scientific Achievement[/URL]?[/QUOTE] Hohoho, let the speculation begin :) I'd not be surprised if it was, although he was pretty adamant human-like behaviour modelling is a difficult premise, one I agree with wholeheartedly.

Baron PI on June 12th, 2008

The intent issue sure is big and unsolved, both in the virtual and real world as far as I know. As commander's intent is given in free text it is important that the subordinate is aware of the implicit intent as it may not be as self-evident as the explicit intent, depending on the subordinate's prior knowing of the specific area of interest. To overcome these hurdles work is being done to formalize an unambiguous language called Battle Management Language (or Coalition Battle Management Language, C-BML) that is based on the Command and Control Lexical Grammar (C2LG). I personally don't see why this language built on the 5W (when, where, why, who, what) wouldn't fit a computer game AI, if successful in the military environment. I don't have any deep knowledge of C-BML or C2LG but I find it very intriguing that the intent issue is present in the military (human-to-human) and AI (human-to-machine or machine-to-human). I briefly mention this in the background of my B.Sc. thesis, maybe the references there can give the interested a new thought or two along the way :) [url][/url]

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!