Special TUESDAY joint Psychology/CS seminar: Toward Mobile and Adaptive Conversational Interfaces
Abstract
During the past decade, rapid advances in spoken language
technology, natural language processing, dialogue modeling,
multimodal interfaces, animated character design, and mobile
applications all have stimulated interest in a new class of
conversational interfaces. Such systems are being designed to
support users' performance in a variety of task applications
(commercial, medical, educational, in-vehicle), and many have been
designed with animated characters that aim to facilitate user
performance. However, the development of robust systems that
process conversational speech is a challenging problem, largely
because users'
spoken language can be extremely variable. In this talk, I'll
describe research in our lab that has identified a new source of
variability in users' spoken language to computers. Basically,
people spontaneously and rapidly adapt the basic acoustic-prosodic
features of their speech signal to the text-to-speech output they
hear from a computer partner. These speech adaptations are
delivered dynamically, since users will quickly readapt their speech
when communicating with a different computer voice.
They also are flexibly bi-directional -- for example, users will
increase their own speech amplitude and rate when conversing with a
computer partner that has louder and faster text-to-speech (TTS)
output, and will decrease these features when the TTS is quieter and
slower. In fact, an analysis of speakers' amplitude, durational
features, and dialogue response latencies confirmed that these
adaptations can be substantial in magnitude (10-50%), with the
largest adaptations involving utterance pause structure and
amplitude. This research underscores the need for new speech and
multimodal systems that can adapt to users and their communication
context. It also emphasizes the importance of auditory interface
design for next-generation mobile systems. Implications are
discussed for designing future conversational interfaces that are
more reliable, well synchronized, and supportive of user performance.
Brief Biography:
Sharon Oviatt is a Professor and Co-Director of the Center for Human-
Computer Communication (CHCC) in the Dept. of Computer Science at
Oregon Health & Science University (OHSU). She received a B.A. with
Highest Honors from Oberlin College and a PhD from the University of
Toronto.
Her research focuses on human-computer interaction, spoken language
and multimodal interfaces, and mobile and highly interactive systems.
Examples of recent work involve the development of novel design
concepts for multimodal and mobile in terfaces, robust interfaces
for real-world field environments and diverse users, and adaptive
conversational interfaces with animated software partners. This
work is funded by grants and contracts from NSF, ONR, DARPA, and
various corporate sources. She is an active member of the
international HCI, speech and multimodal communities, and has
published over 90 scientific articles including work featured in
recent special issues of Communications of the ACM, Human Computer
Interaction, Transactions on Human Computer Interaction, IEEE
Multimedia, Proceedings of IEEE, and IEEE Transactions on Neural
Networks. She was the recipient of an NSF Special Extension for
Creativity Award in 2000, and was Chair of the recent Fifth
International Conference on Multimodal Interfaces in 2003.