This is the second article covering the research sessions at GDC 2007 in Lyon, specifically those relating to artificial intelligence in games. You can find part 1 here, and be sure to stay tuned for the final part 3 next week as part of the Thursday Theory on AiGameDev.com.
The theme for this post is Embodied Conversational Agents (ECA). The idea is basically to model characters that can interact together and with a dynamic world. Traditionally in games, developers use motion capture and voice acting for anything that requires believable behaviors. This research is still a long way off from that level of realism, but there are some great insights to help you add context-sensitive behaviors to your in-game actors.
Photo 1: Post-GDC party photos from the streets of Lyon.
Gaze Control of Conversational Agents
Gerard Bailly, Senior Research Director at CNRS, gave a talk about Gaze Control of Animated Conversational Agents during Face to Face Interaction. I couldn’t make this talk as I had to take part in the panel on next-gen AI. However, Gerard was kind enough to send me his slides and pointers to the relevant papers.
Here are some of the key ideas in the talk which I think can be useful in practice for game developers. I threw in some extra tips myself!
- The Statistics of Eye Movement
Studying the human gazing patterns really helps build better logic for eye control. For example, the visual reaction time is between 190-350ms depending on the number and complexity of objects. Also, eye movement is based on saccades (lasting 30-120ms and ranging from 1° to 40°) and fixations (3-5 per second, lasting 200-300ms).
Tip: Use these default values when implementing your gazing logic!
- Gaze Awareness
Humans are apparently very sensitive to the gaze of others. They often use the focus points of others as another attention point, even if the points aren’t within their field of view. This sometimes results in emergent group staring behaviors.
Tip: Use this idea when you want the player to focus on something; just get the AI characters to look at the object in question!
- Blinking Patterns
Blinking and saccades rarely relate to what’s being said or the state of the character.
Tip: Use very simple logic for controlling blinking and saccades (even statistical or random approach).
- Eyelid Movement
Eyelid behavior has evolved to provide maximum protection of the eye yet still maximize the field of vision, depending on where the eyes are focused. See Alix Casari’s work in English [PDF, 1.0 Mb] or a more relevant one in French [PDF, 497 Kb].
Tip: Vary the height of the eyelids based on where your character is looking. If you don’t do this, or get this wrong, you could end up expressing certain emotions accidentally!
- Scene Scrutinizing
When looking at scenes, hierarchical scrutinizing is a very natural behavior. Essentially, you break down your scene into key regions/objects, and focus on those first, then going into more detail as time permits within each of those groups.
Tip: You can use an attention stack very effectively to organize the things that the agent wishes to look at, and go into more detail when it runs out of items.
These are the ideas that Gerard’s research lab are using to build an agent that can interact with real people, but the ideas are applicable to games too — if you want to go into so much detail! Part of the research is based on the work of Stephan Raidt’s Ph.D. research. Here are some of the relevant papers for you to skim through.
Analyzing and Modeling Gaze During Face-to-Face Interaction Raidt, S., G. Bailly and F. Elisei International Conference on Intelligent Virtual Agents, 2007. Download (PDF, 34 Kb) Scrutinizing Natural Scenes: Controlling the Gaze of an Embodied Conversational Agent Picot, A., G. Bailly, F. Elisei and S. Raidt International Conference on Intelligent Virtual Agents, 2007. Download (PDF, 421 Kb) Mutual Gaze During Face-to-Face Interaction Raidt, S., G. Bailly and F. Elisei Auditory-visual Speech Processing, 2007. Download (PDF, 269 Kb)
Communicational and Emotional Virtual Human
Catherine Pelachaud, from INRIA at University of Paris 8, gave her talk about the key features required to make an ECA truly interactive, communicative, emotional and social. She focused on perception, interaction and generation of behaviors. I missed this talk too as it clashed with another AI session, but Catherine was also kind enough to send along her slides.
Screenshot 2: The overall architecture of an Embodied Conversational Agent.
Here are the key ideas of her talk which seem applicable and useful for game AI:
- Understanding Other’s Behavior
The perceptual modules are based on Simon Baron-Cohen’s Theory of Mind. An important part of this is monitoring the behavior of other characters and assigning intentions to them. In particular, humans analyze the facing direction of the locomotion, body, head and eyes.
Tip: Give your characters a simple model of the emotional status of others to make them seem less autistic when making decisions.
- Figuring Out Attention Levels
These different parts of the body are useful for working out the attention level (for any instant in time) and the level of interest (over a time interval). See the screenshot 3 below for an idea of how to measure this based on the properties of the body, head, and eyes.
Tip: Creating your character behaviors with this attention model in mind to show various levels of interest in other characters (or even the player in your game). This is a great base to build higher-level interactions upon, like trying to attract the attention of others or not behaving in an obtrusive way.
- Behavior Models
The behavior generation in Catherine’s system is called GRETA which is based on Isabella Poggi’s Theory of Communicative Act [1,2]. The idea is to use a model (a.k.a. semantic topology) of the speaker’s identity and mind, as well as information about the world.
- Lexicon & Annotations
A simple lexicon is used to model meaning and the signals of particular expressions. This is where all the emotional intelligence is encoded into XML annotations, for example tilt your head to express a sorry reaction, or raise your eyebrows for emphasis, etc.
Tip: Annotate events and objects in the world to help your characters react emotionally to the world when you don’t have the appropriate story script to go with.
- Distinctive Agents
To make the characters more distinctive, Catherine suggests combining a global scheme (called style/signature of a person) with a local scheme (based on the emotional state or communicative intention) to drive the agent’s behaviors. This is inspired by Maurizio Mancini’s research.
Tip: Give each character a fixed set of emotional responses and write logic to select them in a context-sensitive way.
Screenshot 3: The different levels of attention based on eye/body/head posture.
There are a bunch of papers relating to this on Catherine’s homepage. Note that some are still to be published, so email her if you’re interested in more details.
An Expressive Virtual Agent Head Driven by Music Performance M. Mancini, R. Bresin, C. Pelachaud, IEEE Transactions on Audio, Speech and Language Processing, 2007. Download (PDF, 599 Kb) Multimodal Complex Emotions: Gesture Expressivity and Blended Facial Expressions J.-C. Martin, R. Niewiadomski, L. Devillers, S. Buisine, C. Pelachaud “Achieving Human-Like Qualities in Interactive Virtual and Physical Humanoids” International Journal of Humanoid Robotics, Special Edition, 2006. Download (PDF, 689 Kb)
Don’t expect any of this research to solve all your problems of character believability yet! If you’re already using it, you’ll still need motion capture (for all gestures) and voice acting (for all speech) to sustain a certain level of quality which these prototypes can’t provide yet. But you can use the research here to:
Control eye movement and gazing in a context-sensitive way.
Provide emotional face reactions to dynamic events in the world.
Figure out when to use motion segments or voice clips based on emotions.
Also, it should expand your horizons a little if you’re not familiar with this kind of research!