“Thank you for entering the large orc dining hall. Please listen carefully as our menu has changed.
If you would like to attack the orcs in this room, press 1. If you would like to retreat, press 2. If you would like to –
If you would like your have the thief hide in shadows, press 1. If you –
If you would like the thief to hide in the shadows on the left side of the room, press 1. If you would like the thief to hide in the shadows on the right side of the room, press 2. If you would like the thief to hide in the shadows behind the large chairs on the other side of the banquet table, press 3. If you would like to hear these options ag –
Your thief will hide on the left side. If you would like him to assassinate the orc leader, press 1. If y –
Your thief has now been instructed. Whether or not he can be trusted remains to be seen.
If you would like your wizard to cast offensive spells, press 1. If you would like him to case buffs on you, press 2. If you woul –
If you would like your wizard’s offensive spells to be area of effect in nature, press 1. If you will be wading into the fray and would like your wizard to show some restraint for a change, press 2. –
Your wizard scowls at you but nods that he understands.
If you would like your paladin to attack the first orc from the left, press 1. If you would like your paladin to attack the second orc from the left, press 2. If you would like your paladin to attack the third orc from th –
You have entered a number that is not recognized, please count the orcs again. Remembering to start from the left. If you would like to hear this menu again –
You have entered a number that is not recognized, please count the orcs again. Remembering to start from the left. If you would like to hear thi –
You have entered a number that is not recognized, please count the –
We’re sorry, but you have died. Perhaps next time you should spend more time fighting and less time trying to communicate your intentions on to your party.
Thank you for using our Advanced AI Direction Interface. Good bye.”
Wow. Technology sure is nice and all… but I sure would like to talk to a human once in a while.
The Problem with Sidekicks
For those of you that read last week’s column, “The Art of AI Sidekicks: Making Sure Robin Doesn’t Suck”, you may have seen the flurry of comments that came in. Right out of the gate, Kevin Dill (Rockstar New England) touched on something that set the direction for many of the remaining submissions. He said:
“It strikes me that if you want to do this well, then you need to crack the hard problem of discerning the player’s intent.”
This is, of course, a particularly difficult issue to even approach from a design standpoint much less solve from an algorithmic one. Setting aside the issue of trying to actually have the AI ascertain intent, let’s address the former.
In real life, militaries and other organized groups of people throughout thousands of years have developed ways of communicating intent. Obviously, there is the spoken word as the most direct and usually concise way of expressing something. There are other methods, however, such as hand signals. If you have seen military movies such as Saving Private Ryan, you may have seen some of these in action. (Apparently the hand signals in Saving Private Ryan and Band of Brothers came from the same military consultant, but are inaccurate since they were not in use during WWII.) (I particularly like this parody of the above hand signals.) The bottom line is, real humans have no shortage of ways to communicate either personal intent or orders to others.
As game players, we are often on the receiving end of these. Our allies will use the spoken word or even point in directions or at objects. With a little help from the UI on occasion (glowing objects or paths or on-screen directional arrows seem to cover many of the methods), we can usually ascertain what it is that we are supposed to do.
Likewise, many discussions have been had and great care has been taken to solidify a now-accepted premise in game development – “Communicate the AI’s intent to the player.” If an AI agent will flank, let the player know. If an AI agent will retreat, let the player know. If an AI agent is panicking, let the player know that it is panic and not your AI skitzing out. Anyway, you get the picture about how we are meant to get the picture.
Receiving but not Sending
The problem comes when the roles are reversed. When we want to communicate our intent, either through speech or through action, our hands are, quite literally, tied. We can’t speak to our allies, we can’t do fancy hand signals, we can’t point, we can’t even make eye contact. That means that the AI agent’s inference about our intentions has to be entirely situational and/or entirely reactionary. It has to think about what it would do in the situation (something that we haven’t quite perfected anyway), or it has to simply wait for us to do something first and tag along like a little brother. So there is now a major block in one of the fundamental aspects of teamwork. We are cast in the role of a mute, paralyzed leader without the ability to communicate to the very people who are meant to follow our every order.
This barrier used to exist in online co-op play as well. However, that was somewhat mitigated by the invention, technical feasibility and immediate popularization of voice connections. While we may not be able to do all the pretty hand signals, those were only necessary in times where distance or the proximity of enemies prevented shouting. Given the fact that you are quite possibly speaking to someone sitting 3000 miles away, proximity isn’t an issue. And one would hope that the enemy isn’t tapped into your same voice channel in-game, so there goes the second concern — silent stealth is not an issue. In shooter games, there still is a lack of the ability to point at something, but a little extra verbosity solves that problem. But, short of a seriously robust voice recognition and NLP package, we aren’t going to be talking to our AI pals that much anytime soon.
“Short of a seriously robust voice recognition and Natural Language Processing package, we aren’t going to be talking to our AI pals that much anytime soon.”
In RTS/TBS games, we have been able to do the pointing thing for some time. By “pinging the map”, we can draw our partner’s attention to a particular spot. Since, by their very nature, you are doing a lot of map-pointing in those types of games, this is easily done without breaking the flow of your actions. However, describing what it is you want to have happen at the identified point is not always obvious. In co-op play, we can use the chat box, etc. However, while our AI ally will very easily be able to determine the location of the ping, he won’t necessarily know what to do about it unless we encode more information.
Again, that is easily done in the realm of the keyboard-driven RTS. But is it something that we can then map over back into the FPS/RPG world — especially on consoles? With enough buttons and sticks to press and wiggle, we should be able to at least put together a rudimentary vocabulary of commands and intents. Point your gun at a barrel and say “Hide behind that“. Point your view at an enemy and say “Dude is waaay too scary for me… you take him.” While it looks OK on paper, in a firefight, the last thing I want to do is point away from my target in order to tell my cohort to “go around to the left“. Maybe slipping a thumb down onto the D-pad to convey something as simple as advance, retreat, flank left or flank right is doable… but anything more than that and you might as well put a whistle in my mouth and a funny hat on my head.
I suppose another method of input would be something along the lines of gesture recognition. Back in the days of Black & White, you could do some nifty stuff by drawing right patterns on the screen. It might be quite feasible, and even somewhat realistic, to transform that into a way of gesticulating. Of course, doing Black & White style gestures might be a bit more tricky on the console controlers. Even so, regardless of the platform, we are usually armed with only one “arm”, if you will — only one direction of pointing. If I’m looking down my sights at someone, I would like to keep that view while waving to my posse. That being said, the Wii-mote would be delightful for this purpose.
Screen 1: Casting spells via gesture recognition in Lionhead’s “Black & White”
Beyond Point and Shoot
Of course, much of the above discussion is more about design decisions and interface assignments. What about the AI? Going back to Kevin’s comment — how do we get the AI ally to understand what it is that we are trying to accomplish? When creating enemy AI or only loosely associated allies, our progeny are largely on their own. If they do something, it is because they decided to do it on their own, using their own little brains. If it happens to be stupid, so be it. Everyone makes mistakes, right? Even if the other allied soldiers go charging ahead when you are feeling like a chicken, it serves to make you feel more like a chicken. If they hang back while you go barreling into the fray, you assess your own bravery by the yardstick set by your relatively immobile squad-mates. However, if you have someone with whom you are supposed to be working in some sort of quasi-symbiotic relationship, you really expect that you are working together — not just in parallel.
Part of the process would be identifying what particular inputs one would use as part of that algorithm. Surely, much of it is game dependant, but there would be some typical behaviors. Those would not necessarily be without confusion, however. For example, if I creep up to a corner and peer around it, what am I going to do next? Am I planning on surprising the enemies with a pop-out and rush attack, am I planning to blind fire around the corner, or am I simply verging on wetting myself? Maybe I’m just observing and formulating an approach route for me and my partner… which I won’t be able to communicate to him anyway.
I suppose one could throw in one of the staples of our AI wish lists at this point: “Can this be done with learning?” Being completely idealistic about things, we could say that a learning algorithm tied to our actions would begin to figure out what we meant when we did certain things. On the other hand, speaking for myself as a game player, there are plenty of times when I’m not sure what it is I’m even going to do next. I can’t imagine a learning algorithm that would easily sift through my fits and starts and be any better off than when we started.
So, I guess this question is a little more abstract than I intended this week. What it boils down to, however, is this: Are we entirely dependent on the interface for our allied communication? And, if so, what are the potential solutions? To be honest, if we don’t come up with any, allied AI agents are forever going to be upstaged by online co-op play.