Jack Mostow and Maxine Eskenazi

Robotics Institute and Human-Computer Interaction Institute
Carnegie Mellon University


Mail: CMU Project LISTEN, 215 Cyert Hall, 4910 Forbes Avenue, Pittsburgh, PA 15213-3890


Telephone: (412)268-1330

FAX: (412) 268-6298



2. Speech and Natural Language Understanding.


Project LISTEN, automated reading coach, continuous speech recognition, prosodic analysis, dialogue, speech interfaces for children, voice recognition, education, children, literacy


To carry on effective, natural, spoken dialogue, computers will need to do more than recognize and interpret spoken words -- they will also need to be sensitive to the prosodic information encoded in how the words are spoken. Human listeners use this information to extract additional meaning from the speech signal, to guide their participation in dialogue, and to draw inferences about each other that can help them in performing the task at hand. Automated prosodic analysis has been studied much less than speech recognition, and until now has focussed more on improving the accuracy of word recognition than on improving the effectiveness of dialogue at accomplishing a real task. The proposed research will investigate the hypothesis that detecting and exploiting prosodic cues can help computers guide spoken dialogue.

The proposed case study of prosody focuses on an educational task that combines intrinsic national importance with compelling methodological advantages. The proposed educational task is to "coach" children's oral reading -- that is, display text on the screen, listen to a child read it aloud, detect the child's mistakes, decide when and how to intervene, and provide help and encouragement. The proposed research builds on the code, data, and experience gained from a working prototype of such a coach, developed with prior NSF support.

The proposed research will focus on improving four aspects of dialogue -- taking turns, handling speech repairs, preventing dialogue breakdown, and modelling the speaker. In the context of the reading task, these aspects include detecting a number of pedagogically significant events, such as when readers complete a passage, correct themselves, or encounter difficulty in identifying a word or comprehending a passage. The proposed research will use prosodic cues to help detect these events in order to make the dialogue between student and coach more effective in achieving its educational objectives.

Expected outcomes include not only improvements in the reading coach, but more generally the discovery of robust prosodic phenomena, methods for detecting them, and principles for using them to improve spoken communication so as to better accomplish the task at hand. This work will lay essential foundations for using prosody to achieve graceful, effective spoken dialogue between humans and computers.


[Hauptmann et al 93] A. G. Hauptmann, L. L. Chase, and J. Mostow. Speech Recognition Applied to Reading Assistance for Children: A Baseline Language Model. In Proceedings of the 3rd European Conference on Speech Communication and Technology (EUROSPEECH93), pages 2255-2258. Berlin, September, 1993.

[Mostow et al 93] J. Mostow, A. G. Hauptmann, L. L. Chase, and S. Roth. Towards a Reading Coach that Listens: Automated Detection of Oral Reading Errors. In Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI93), pages 392-397. American Association for Artificial Intelligence, Washington, DC, July, 1993.

[Mostow et al 94a] J. Mostow, S. Roth, A. Hauptmann, M. Kane, A. Swift, L. Chase, and B. Weide. A reading coach that listens: (edited) video transcript. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI94), pages 1507. Seattle, WA, August, 1994.

[Mostow et al 94b] J. Mostow, S. Roth, A. G. Hauptmann, and M. Kane. A Prototype Reading Coach that Listens. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), pages 785-792. American Association for Artificial Intelligence, Seattle, WA, August, 1994. Recipient of the AAAI-94 Outstanding Paper Award.

[Mostow et al 94c] J. Mostow, S. Roth, A. Hauptmann, M. Kane, A. Swift, L. Chase, and B. Weide. A Reading Coach that Listens (6-minute video). In Video Track of the Twelfth National Conference on Artificial Intelligence (AAAI94). American Association for Artificial Intelligence, Seattle, WA, August, 1994.

[Mostow et al 95] J. Mostow, A. Hauptmann, and S. Roth. Demonstration of a Reading Coach that Listens. In Proceedings of the Eighth Annual Symposium on User Interface Software and Technology. Sponsored by ACM SIGGRAPH and SIGCHI in cooperation with SIGSOFT, Pittsburgh, PA, November, 1995.


Reading is taught orally in grades 1-3 to help children relate printed English to the spoken language they have already acquired. Unfortunately, a shocking percentage of the nation's children lag behind grade level in reading [NCES 93a] and grow up functionally illiterate, at an annual productivity cost measured in hundreds of billions of dollars [USCD 91]. An automated reading coach could give such children hundreds of hours of individualized attention that teachers and parents cannot. Thus Project LISTEN's eventual goal is to help children learn to read better over time. The goal of the current coach is to help them read a given text.

The coach is designed to provide a combination of reading and listening, in which the child reads whenever possible, and the coach helps whenever necessary, so as to provide a pleasant, successful reading experience. The coach's assistance, modelled after expert reading teachers, is intended to support word identification, comprehension, and motivation.


[Adams 90] M. J. Adams. Beginning to Read: Thinking and Learning about Print. MIT Press, Cambridge, MA, 1990.

[Barr et al 91] R. Barr, M. L. Kamil, P. B. Mosenthal, and P. D. Pearson. Handbook of reading research. Longman Publishing Group, 95 Church Street, White Plains, NY 10601, 1991. ISBN 0-8013-0292-7.

[Huang et al 93] X. D. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, K. F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: An overview. Computer Speech and Language 7(2):137-148, April, 1993.

[NCES 93a] National Center for Education Statistics. NAEP 1992 Reading Report Card for the Nation and the States: Data from the National and Trial State Assessments. Technical Report Report No. 23-ST06, U.S. Department of Education, Washington, DC, September, 1993.

[NCES 93b] National Center for Education Statistics. Adult Literacy in America. Technical Report GPO 065-000-00588-3, U.S. Department of Education, Washington, DC, September, 1993.

[OTA 93] Office of Technology Assessment. Adult Literacy and New Technologies: Tools for a Lifetime. Technical Report OTA-SET-550, U.S. Congress, Washington, DC, July, 1993.

[Pearson 84] P. D. Pearson (editor). Handbook of Reading Research. Longman Publishing Group, New York, 1984. ISBN 0-582-28119-9.

[Pierrehumbert and Hirschberg 90] J. Pierrehumbert and J. Hirschberg. The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan, and M. Pollack (editors), Intentions and Plans in Communication and Discourse. MIT Press, 1990.

[USCD 91] USCD. Closing the Literacy Gap in American Business. Technical Report, United States Commerce Department, 1991.

[Waibel and Lee 90] A. Waibel and K.-F. Lee. Readings in Speech Recognition. Morgan Kaufmann, San Mateo, CA, 1990.


3. Other Communication Modalities.

4. Adaptive Human Interfaces.

5. Usability and User-Centered Design.

6. Intelligent Interactive Systems for Persons with Disabilities.


Use eye tracking to help guide the reading coach.