Special TUESDAY joint Psychology/CS seminar: Toward Mobile and Adaptive Conversational Interfaces

November 16, 2004
2:50 pm - 4:00 pm
Halligan 106
Host: Rob Jacob


During the past decade, rapid advances in spoken language technology, natural language processing, dialogue modeling, multimodal interfaces, animated character design, and mobile applications all have stimulated interest in a new class of conversational interfaces. Such systems are being designed to support users' performance in a variety of task applications (commercial, medical, educational, in-vehicle), and many have been designed with animated characters that aim to facilitate user performance. However, the development of robust systems that process conversational speech is a challenging problem, largely because users' spoken language can be extremely variable. In this talk, I'll describe research in our lab that has identified a new source of variability in users' spoken language to computers. Basically, people spontaneously and rapidly adapt the basic acoustic-prosodic features of their speech signal to the text-to-speech output they hear from a computer partner. These speech adaptations are delivered dynamically, since users will quickly readapt their speech when communicating with a different computer voice. They also are flexibly bi-directional -- for example, users will increase their own speech amplitude and rate when conversing with a computer partner that has louder and faster text-to-speech (TTS) output, and will decrease these features when the TTS is quieter and slower. In fact, an analysis of speakers' amplitude, durational features, and dialogue response latencies confirmed that these adaptations can be substantial in magnitude (10-50%), with the largest adaptations involving utterance pause structure and amplitude. This research underscores the need for new speech and multimodal systems that can adapt to users and their communication context. It also emphasizes the importance of auditory interface design for next-generation mobile systems. Implications are discussed for designing future conversational interfaces that are more reliable, well synchronized, and supportive of user performance.

Brief Biography: Sharon Oviatt is a Professor and Co-Director of the Center for Human- Computer Communication (CHCC) in the Dept. of Computer Science at Oregon Health & Science University (OHSU). She received a B.A. with Highest Honors from Oberlin College and a PhD from the University of Toronto. Her research focuses on human-computer interaction, spoken language and multimodal interfaces, and mobile and highly interactive systems. Examples of recent work involve the development of novel design concepts for multimodal and mobile in terfaces, robust interfaces for real-world field environments and diverse users, and adaptive conversational interfaces with animated software partners. This work is funded by grants and contracts from NSF, ONR, DARPA, and various corporate sources. She is an active member of the international HCI, speech and multimodal communities, and has published over 90 scientific articles including work featured in recent special issues of Communications of the ACM, Human Computer Interaction, Transactions on Human Computer Interaction, IEEE Multimedia, Proceedings of IEEE, and IEEE Transactions on Neural Networks. She was the recipient of an NSF Special Extension for Creativity Award in 2000, and was Chair of the recent Fifth International Conference on Multimodal Interfaces in 2003.