2. Speech and Natural Language Understanding.


Project LISTEN, automated reading coach, continuous speech recognition, prosodic analysis, dialogue, speech interfaces for children, voice recognition, education, children, literacy


To carry on effective, natural, spoken dialogue, computers will need to do more than recognize and interpret spoken words -- they will also need to be sensitive to the prosodic information encoded in how the words are spoken. Human listeners use this information to extract additional meaning from the speech signal, to guide their participation in dialogue, and to draw inferences about each other that can help them in performing the task at hand. Automated prosodic analysis has been studied much less than speech recognition, and until now has focussed more on improving the accuracy of word recognition than on improving the effectiveness of dialogue at accomplishing a real task. The proposed research will investigate the hypothesis that detecting and exploiting prosodic cues can help computers guide spoken dialogue.

The proposed case study of prosody focuses on an educational task that combines intrinsic national importance with compelling methodological advantages. The proposed educational task is to "coach" children's oral reading -- that is, display text on the screen, listen to a child read it aloud, detect the child's mistakes, decide when and how to intervene, and provide help and encouragement. The proposed research builds on the code, data, and experience gained from a working prototype of such a coach, developed with prior NSF support.

The proposed research will focus on improving four aspects of dialogue -- taking turns, handling speech repairs, preventing dialogue breakdown, and modelling the speaker. In the context of the reading task, these aspects include detecting a number of pedagogically significant events, such as when readers complete a passage, correct themselves, or encounter difficulty in identifying a word or comprehending a passage. The proposed research will use prosodic cues to help detect these events in order to make the dialogue between student and coach more effective in achieving its educational objectives.

Expected outcomes include not only improvements in the reading coach, but more generally the discovery of robust prosodic phenomena, methods for detecting them, and principles for using them to improve spoken communication so as to better accomplish the task at hand. This work will lay essential foundations for using prosody to achieve graceful, effective spoken dialogue between humans and computers.


Reading is taught orally in grades 1-3 to help children relate printed English to the spoken language they have already acquired. Unfortunately, a shocking percentage of the nation's children lag behind grade level in reading [NCES 93a] and grow up functionally illiterate, at an annual productivity cost measured in hundreds of billions of dollars [USCD 91]. An automated reading coach could give such children hundreds of hours of individualized attention that teachers and parents cannot. Thus Project LISTEN's eventual goal is to help children learn to read better over time. The goal of the current coach is to help them read a given text.

The coach is designed to provide a combination of reading and listening, in which the child reads whenever possible, and the coach helps whenever necessary, so as to provide a pleasant, successful reading experience. The coach's assistance, modelled after expert reading teachers, is intended to support word identification, comprehension, and motivation.


Use eye tracking to help guide the reading coach.