Media Laboratory
Massachusetts Institute of Technology
Email: geek@media.mit.edu
Phone: +1-617-253-5156
Fax: +1-617-258-6264
But strongly related to Adaptive User Interfaces.
Previous work has used the acoustic cues of pauses and pitch, usually in isolation, to attempt to identify emphasized or salient portions of a monologue. Well developed theories of discourse suggest a much richer underlying structure, however. Discourse structure will be exploited to provide a hierarchical "view" of the recording, allowing listening at different levels of detail. Automatic detection of discourse structure will necessitate correlation of the multiple acoustic cues of both pitch and duration. The results will then be evaluated against a marked corpus. We do not expect perfect results; we hope to obtain results which are good enough to form cues to be exploited by the speech skimming user interface.
The user interface will be built around a physical interaction device, not a traditional graphical user interface. Our goal is to prototype a device which could be built into appropriate portable hardware and allow one to listen in a variety of situations where conventional interfaces are inappropriate, such as while commuting or exercising. Determining the physical form of the interface is in part the subject of this research, but we propose for it to support a model of "audio zooming" facilitating rapid scanning of the stored material at different levels of detail. The interface will include time scaling, skimming at various granularities, and acoustic feedback to convey the discourse structure and skipped portions. The interface will be designed iteratively with continuous user evaluation.
Stifelman, L. J. A Discourse Analysis Approach to Structured Speech. In Proceedings of the AAAI 95 Spring Symposium Series. Empirical Methods in Discourse Interpretation and Generation, 1995.
Arons, B. Speech Skimmer: Interactively Skimming Recorded Speech. In Proceedings, UIST, ACM, 1993.
Schmandt, C. and Mullins, A. AudioStreamer: Exploiting Simultaneity for Listening. In Proceedings of CHI 95 Short Papers, ACM SIGCHI, 1995.
The other related background area is the traditional human-computer interaction field, in which there have been a small number of projects (many from the same Lab at MIT) on various approaches for capturing and later interacting with digital recordings of speech.
Grosz, B. and Hirschberg, J. Some Intonational Characteristics of Discourse Structure. In Proceedings of the International Conference on Spoken Language Processing, pages 429-432. 1992.
Grosz, B. and Sidner, C. Attention, Intentions, and the Structure of Discourse. Computational Linguistics, 12(3):175P204, 1986.
Usability and User-Centered Design.