with co-PIs Herbert Clark (Stanford University) and Stefanie
Shattuck-Hufnagel (Massachusetts Institute of Technology), and collaborators
Elizabeth Shriberg (on leave from SRI at IPO, Eindhoven, the Netherlands)
and John Bear (SRI)
Speech Technology and Research Laboratory
SRI International
333 Ravenswood Ave., EJ147
Menlo Park, CA 94025
Clark, H. H. (1994) "Discourse in production," in M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics, San Diego: Academic Press.
Clark, H. H. (1994) "Managing problems in speaking," Speech Communication, 15, pp. 243-250.
Clark, H. H. (1996) Using Language. Cambridge: Cambridge University Press.
Clark, H. H. and Wasow, T. (in preparation). Repeated pronouns and articles in spontaneous speech.
Clark, H. H., and Bly, B. (1994) "Pragmatics and discourse," in J. L. Miller and P. D. Eimas (Eds.), Handbook of Perception and Cognition, Vol 11: Speech, Language, and Communication, New York: Academic Press.
Dilley, L. and S. Shattuck-Hufnagel (1995) "Individual differences in the glottalization of vowel-initial syllables," J. Acoust. Soc. Am., Vol. 97, No. 5, Pt. 2, pp. 3418-3419.
Fox Tree, J. E. (1995) "Effects of false starts and repetitions on the processing of subsequent words in spontaneous speech," Journal of Memory and Language, 34.
Fox Tree, J. E. and Clark, H. H. (submitted). Pronouncing "the" as "thee" to signal problems in speaking.
Shriberg, E. (1994) Preliminaries to a Theory of Speech Disfluencies, U. Cal. Berkeley Ph.D. Thesis. To be published, John Benjamins, 1996. Data and thesis available by ftp, contact ees@speech.sri.com.
Shriberg, E. (1995) "Acoustic properties of disfluent repetitions," Int. Congress Phonetic Sciences, Stockholm, Sweden.
Although there has been significant work devoted to some spontaneous speech phenomena, such as "slips of the tongue," other much more frequent types of spontaneous speech "disfluencies" have been largely ignored, e.g., false starts, hesitations, filled pauses and related phenomena. Such disfluencies are highly prevalent in normal human communication, and though such phenomena may at present be less frequent in human-machine dialogue than in human-human dialogue, the causes and costs (e.g., in terms of cognitive load on the user) of this discrepancy are unknown. Further, because current speech understanding systems do not model disfluencies well, when they do occur, they are correlated with speech recognition and understanding errors. It is likely that as human-machine dialogue becomes more common, and as users start to concentrate more on the task at hand than on their speech style, that the rates of disfluencies will rise to rates closer to those observed in human-machine communication. A better understanding of the interdisciplinary aspects of disfluencies is critical to the development of a principled treatment of these highly frequent attributes of spontaneous speech.
Levelt, W. (1983) "Monitoring and self-repair in speech," Cognition, Vol. 14, pp. 41-104,
Nakatani, C. and J. Hirschberg (1994) "A corpus-based study of repair cues in spontaneous speech," Journal Acoustical Society of America", Vol. 95, No. 3, pp. 1603-1616.
O'Shaughnessy, D. (1994) "Correcting Complex False Starts in Spontaneous Speech," ICASSP-94, Vol. I.
1. Virtual Environments. Speech is an important tool in virtual environments: it can enhance the immersion effect, and it can be used as a tool to guide the user through the environment.
3. Other Communication Modalities. Speech is a modality that is known to interact with others, notably vision.
4. Adaptive Human Interfaces. Speech recognition techniques can adapt to the speaker, but more work is needed in both speech recognition and speech synthesis. In addition, work is needed in deciding when it is appropriate to offer speech as an alternative modality.
5. Usability and User-Centered Design. The use of speech as an input and/or output modality is one aspect in the interface design.
6. Intelligent Interactive Systems for Persons with Disabilities. Speech input and output can be important for the blind and dislexic. These technologies can also be used for those who may not be well understood by people not used to their voices: for example, the profoundly deaf or disarthric.