TOWARD ROBUST SPOKEN-LANGUAGE SYSTEMS

David G. Novick

Center for Spoken Language Understanding,
Oregon Graduate Institute,
P.O. Box 91000,
Portland, OR, 97291-1000

CONTACT INFORMATION

novick@cse.ogi.edu, voice 503-690-1156, fax 503-690-1334

WWW PAGE

http://www.cse.ogi.edu/~novick

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Dialogue, pragmatics, mutuality, spontaneous language, robustness

PROJECT SUMMARY

At OGI, this research program seeks to remediate three key sources of variation that limit robust performance of spoken-language systems: variation in the acoustic environment, variation in pronunciation of words, and variation in dialogue structures. My research on dialogue investigates (1) use of prosody for speech-act disambiguation and (2) conversational control acts through empirical analysis of task-oriented dialogue. The results of my team's efforts include evidence that some pragmatic disambiguation of speech acts in spoken dialogue is possible using prosodic cues to distinguish acknowledgments from non-acknowledgments, and an empirically based model of "mutuality strategies" that help to predict acknowledgment (or other acceptance levels) in dialogue. We also demonstrated that knowledge of dialogue context increases intelligibility and showed that dialogue prompt styles, which can reduce variation in users' utterances, can be automatically generated.

PROJECT REFERENCES

Fanty, M., Sutton, S., Novick, D., and Cole, R. (1995). Automated appointment scheduling, ESCA Workshop on Spoken Dialogue Systems, Vigso, Denmark, May, 1995, 144-47.

Novick, D., and Ward, K. (1995). The effect of context on the intelligibility of dialogue, Eurospeech'95, Madrid, Spain, September, 1995, 1235-1238.

Novick, D., and Hansen, B. (1995, invited). Mutuality strategies for reference in task-oriented dialogue, Twente Workshop on Language Technology 9 (TWLT9), Enschede, The Netherlands, June, 1995, 83-93.

Sutton, S., Hansen, B., Lander, T., Novick, D., and Cole, R. (1995). Evaluating the effectiveness of dialogue for an automated spoken questionnaire. Technical Report CSE 95-012, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology.

Ward, K., and Novick, D. (1995). Prosodic cues to word usage, Proceedings of ICASSP-95, Detroit, MI, May, 1995, 620-623.

Ward, K., and Novick, D. (1995). Integrating multiple cues for spoken-language understanding, Conference on Human Factors in Computing Systems (CHI'95), Denver, CO, May, 1995.

Ward, K., and Novick, D. (1994). On the need for a theory of knowledge sources for spoken-language understanding, Working Notes of the AAAI-94 Workshop on Integration of Natural Language and Speech Processing, Seattle, WA, July, 1994, 23-30.

AREA BACKGROUND

The overall area of my research could be described generally as computational pragmatics. The fundamental question I seek to answer is "What does it mean to interact?" With this knowledge, developers could build systems that are more useful and intuitive for many users. These factors are particularly important for emerging technologies such as ubiquitous access to broadband information carriers and computer-based telecommunications. Research methods involve observation, analysis and modeling of naturally situated interaction, including face-to-face, telephone and computer-mediated conditions; simulation of interaction via computational agents; and implementation and testing of systems.

Foundations of the research include work on mutual knowledge and control in conversation. Sacks, Schegloff and Jefferson (1974) described a model of turn-taking in conversation that defined this area of inquiry. The notion of mutual knowledge was significantly developed by Clark and Marshall (1981), who used co-presence heuristics to attack the thorny problem of infinite regress in simple modal logic representations of conversational understanding. Key models of conversational structure based on mutual knowledge and confirmation include Grosz and Sidner's (1986) focus model and Clark and Schaefer's (1987, 1989) model of contribution trees defined by patterns of presentation and acceptance.

A significant aspect of this research area involves modeling and simulating conversations, especially with respect to conversational control. The first known simulation of conversation by computational agents was reported by Power (1979). In this work, a pair of agents named John and Mary conversed in simple English about opening a door and moving between rooms; conversational turn-taking was controlled by the system in which the agents were implemented. In Novick (1988, 1991), two agents conversed using speech acts, reproducing human conversations in a laboratory-based "letter sequence" task. Subsequently, Traum and Hinkelman (1992), then both students of Allen at Rochester, also used agents to simulate conversations in a laboratory-based "trains" domain. The first replication of a naturally occurring conversation was reported by Novick and Ward (1993).

AREA REFERENCES

Clark, H. H., and Marshall, C. R. (1981). Definite reference and mutual knowledge. In A K. Joshi, B. L. Webber, I. A. Sag (Eds.), Elements of discourse understanding (pp. 10-63). New York: Cambridge University Press.

Clark, H. H., and Schaefer, E. F. (1987). Collaborating on contributions to conversations. Language and Cognitive Processes, 2, 19-41.

Clark, H.H., and Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13(2) pp. 259-294.

Grosz, B. J., and Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175-204.

Novick, D. G. (1988). Proceedings of Control of mixed-initiative discourse through meta-locutionary acts: A computational model. Technical Report CIS-TR-88-18, Department of Computer and Information Science, University of Oregon.

Novick, D. (1991). Controlling interaction with meta-acts (refereed poster). Conference on Human Factors in Computing Systems (CHI'91), New Orleans, LA, May, 1991.

Novick, D., and Ward, K. (1993). Mutual beliefs of multiple conversants: A computational model of collaboration in air traffic control. Proceedings of AAAI'93, Washington, DC, July, 1993, 196-201.

Power, R. (1979). The organization of purposeful dialogues. Linguistics, 17, 107-153.

Sacks, H., Schegloff, E., and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking in conversation. Language, 50, 696-735.

Traum, D., and Hinkelman, E. (1992). Conversation acts in task-oriented spoken dialogue. Computational Intelligence, 8(3).

RELATED PROGRAM AREAS

(3) Other Communication Modalities, (5) Usability and User-Centered Design. and (6) Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

The models of dialogue which we're developing have straighforward interpretations in terms of HCI beyond spoken-language interfaces. Our work on issues such as mixed-initiative, confirmation and repair should be particularly useful for improving a priori usability, and for modeling the interactions developed in user-centered design techniques. Part of research has addressed cross-modal and multi-modal interaction; related work would include understanding, based on models of mutuality, why interaction in uni- or multi-modal systems would vary in effectivess.