Chapter 3: Towards Multimodal Human-computer Dialogue by Intelligent Agents | Linux Troubleshooting for System Administrators and Power Users

Patrice Clemente, France T l com R&D, Lannion,

France

1 Introduction

Mobile telephones, PDA, GPS, communicating clothes, infra-red connections, Bluetooth technology, domestic networks, domestic robots, software agents, etc. The list is long. It is necessary to face the evidence: communicating objects have already started to invade us, and that will continue. This increasing volume foreshadows many problems of interaction for the future, between man and these objects, and between the objects themselves .

Moreover, the absence of standards of communication between objects will lead to a multitude of protocols, and to interoperating problems between communicating objects.

The difficulties of coherence and cohesion between these objects, due to their number and their autonomy, will generate unexpected and undesirable behaviour from the systems or networks of objects.

The respect of the free-referee and the integrity of the individual will be complex to guarantee. Communications are inevitably increasing in number, and information of all kinds will submerge users, if they do not have intelligent and suitable mediators.

If no precautions are taken, systems will thus become useless, or unusable. To avoid these pitfalls, one has to keep control of objects and systems. This requires from the latter to precisely "understand" the desires and needs of the user , an indispensable condition to satisfy them, and this, whatever the media used or wished for by the user. Objects and systems have to answer to the requests of the user and adapt their behaviour according to his/her personal profile, to the context (situation, history, etc), and to the type of task.

Intelligent agents, autonomous software entities, are able to reason, act, and bring interesting solutions to those problems.

An intelligent agent is able to perceive and act on its environment. Thus, it can control "unintelligent" communicative objects such as actuators.

Moreover, an intelligent agent can communicate, when provided with dialogue capacities . It can thus interact with other dialoguing agents, which for example deliver information or are integrated in communicating objects. The means of communication used is therefore an inter-agents ^[1] language of communication.

Finally, an intelligent agent can converse in a natural way with humans . Throughout the dialogue, it can help people achieve their goals, deliver relevant information to them, carry out a certain number of actions (possibly for them), supervise their resources and all this in a dynamic way and upon request. In this case, the means of interaction traditionally used is a natural language, such as English.

A system in which an agent is introduced benefits at the same time from the agent's intelligence. The agent constitutes a comprehensible and co- operative interlocutor. It can represent, for example, an assistant or a personal secretary and then will learn from its owner's specificities, and adapt to them. The agent can play the role of mediator and preserve the user from all kinds of intrusions from his/her environment, like undesired or non prior information.

For example, when entering a store, a user does not systematically want the items compatible with the shopping list to appear on the PDA. When approaching objects in this same store, he may not want his/her PDA to indicate prices either, although this function is always available.

We will develop into more detail a particular application of intelligent agents: human-computer dialogue (HCD). More particularly, we will treat multimodal ^[2] HCD.

Initially, we will point out traditional approaches of HCD and current multimodal HCD. We will then come to the gist of this chapter: the phenomenon of multimodal referring to objects. After having recalled the main problems of linguistic and multimodal referring, we will introduce our formalism for multimodal referring, made possible by an original representation of objects. We will show a theoretical model of a multimodal referring act, illustrated by a short example. We will conclude with technical remarks on our model and its implementation and general ones on systems which it will allow to develop.

^[1] This language can be ACL ( Agent Communication Language ) proposed by the FIPA consortium. ACL is founded on the formal definition of communicating acts between agents, making it possible to carry out unambiguous interactions.

^[2] i.e. using several communication modalities. The communication modalities are defined by the structure of information which they convey (linguistic, graphic, haptic, etc) and their intrinsic properties. As they are linked with communication modes (acquisitive and productive modes), it is possible to classify them into inputs and output modalities (see [BER 97] for a survey on representation modalities).