MODELING DISFLUENCIES IN SPONTANEOUS SPEECH

Patti Price

with co-PIs Herbert Clark (Stanford University) and Stefanie Shattuck-Hufnagel (Massachusetts Institute of Technology), and collaborators Elizabeth Shriberg (on leave from SRI at IPO, Eindhoven, the Netherlands) and John Bear (SRI)


Speech Technology and Research Laboratory
SRI International
333 Ravenswood Ave., EJ147
Menlo Park, CA 94025

CONTACT INFORMATION

PHONE: 415-859-5845
FAX: 415-859-5984
EMAIL: pprice@speech.sri.com

WWW PAGE

http://www-speech.sri.com

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Disfluency, prosody, speech recognition, spontaneous speech, hesitation, speech understanding

PROJECT SUMMARY

As spoken language understanding systems evolve towards use with natural, spontaneous speech, rates of disfluencies ("uh", "um", word fragments, phrase fragments, etc.) can be expected to rise to rates similar to those observed in spontaneous conversational speech (affecting one-third of the utterances or more). This project investigates speech disfluencies in various corpora and in controlled experiments with an aim toward incorporating results in SRI's spoken language understanding system. This multi-disciplinary investigation is undertaken by a team representing expertise in linguistics, psycholinguistics, and computational linguistics. Each of the investigators has experience in disfluencies, but from different, complementary perspectives.

PROJECT REFERENCES

Brennan, S. E., and Williams, M. (1995) "The feeling of another's knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers," Journal of Memory and Language, 34, pp. 383-398.

Clark, H. H. (1994) "Discourse in production," in M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics, San Diego: Academic Press.

Clark, H. H. (1994) "Managing problems in speaking," Speech Communication, 15, pp. 243-250.

Clark, H. H. (1996) Using Language. Cambridge: Cambridge University Press.

Clark, H. H. and Wasow, T. (in preparation). Repeated pronouns and articles in spontaneous speech.

Clark, H. H., and Bly, B. (1994) "Pragmatics and discourse," in J. L. Miller and P. D. Eimas (Eds.), Handbook of Perception and Cognition, Vol 11: Speech, Language, and Communication, New York: Academic Press.

Dilley, L. and S. Shattuck-Hufnagel (1995) "Individual differences in the glottalization of vowel-initial syllables," J. Acoust. Soc. Am., Vol. 97, No. 5, Pt. 2, pp. 3418-3419.

Fox Tree, J. E. (1995) "Effects of false starts and repetitions on the processing of subsequent words in spontaneous speech," Journal of Memory and Language, 34.

Fox Tree, J. E. and Clark, H. H. (submitted). Pronouncing "the" as "thee" to signal problems in speaking.

Shriberg, E. (1994) Preliminaries to a Theory of Speech Disfluencies, U. Cal. Berkeley Ph.D. Thesis. To be published, John Benjamins, 1996. Data and thesis available by ftp, contact ees@speech.sri.com.

Shriberg, E. (1995) "Acoustic properties of disfluent repetitions," Int. Congress Phonetic Sciences, Stockholm, Sweden.

AREA BACKGROUND

Spoken language is the medium used first and foremost by humans for accurate and efficient interactive problem solving. As an input modality for human-computer interaction, spoken language can offer: (1) accessibility to an increasing number of people, including those with little or no training, (2) increased access to a growing set of data resources via telephone without a computer terminal, (3) increased power for those already familiar with computer technology, (4) an additional communication channel for more robust communication, for use in unusual environments, and for devices for the disabled, (5) flexibility of modality and use of computers by humans generally, and (6) increased applications and job opportunities in areas that will grow out of increased exposure of people to the potential of technology.

Although there has been significant work devoted to some spontaneous speech phenomena, such as "slips of the tongue," other much more frequent types of spontaneous speech "disfluencies" have been largely ignored, e.g., false starts, hesitations, filled pauses and related phenomena. Such disfluencies are highly prevalent in normal human communication, and though such phenomena may at present be less frequent in human-machine dialogue than in human-human dialogue, the causes and costs (e.g., in terms of cognitive load on the user) of this discrepancy are unknown. Further, because current speech understanding systems do not model disfluencies well, when they do occur, they are correlated with speech recognition and understanding errors. It is likely that as human-machine dialogue becomes more common, and as users start to concentrate more on the task at hand than on their speech style, that the rates of disfluencies will rise to rates closer to those observed in human-machine communication. A better understanding of the interdisciplinary aspects of disfluencies is critical to the development of a principled treatment of these highly frequent attributes of spontaneous speech.

AREA REFERENCES

Bear, J., J. Dowding, and E. Shriberg (1992) "Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog," Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics.

Levelt, W. (1983) "Monitoring and self-repair in speech," Cognition, Vol. 14, pp. 41-104,

Nakatani, C. and J. Hirschberg (1994) "A corpus-based study of repair cues in spontaneous speech," Journal Acoustical Society of America", Vol. 95, No. 3, pp. 1603-1616.

O'Shaughnessy, D. (1994) "Correcting Complex False Starts in Spontaneous Speech," ICASSP-94, Vol. I.

RELATED PROGRAM AREAS

1. Virtual Environments.
3. Other Communication Modalities.
4. Adaptive Human Interfaces.
5. Usability and User-Centered Design.
6. Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

Since all areas below involve natural speech, they all include disfluencies.

1. Virtual Environments. Speech is an important tool in virtual environments: it can enhance the immersion effect, and it can be used as a tool to guide the user through the environment.

3. Other Communication Modalities. Speech is a modality that is known to interact with others, notably vision.

4. Adaptive Human Interfaces. Speech recognition techniques can adapt to the speaker, but more work is needed in both speech recognition and speech synthesis. In addition, work is needed in deciding when it is appropriate to offer speech as an alternative modality.

5. Usability and User-Centered Design. The use of speech as an input and/or output modality is one aspect in the interface design.

6. Intelligent Interactive Systems for Persons with Disabilities. Speech input and output can be important for the blind and dislexic. These technologies can also be used for those who may not be well understood by people not used to their voices: for example, the profoundly deaf or disarthric.