TOWARD ROBUST SPOKEN-LANGUAGE SYSTEMS

Ronald A. Cole

Center for Spoken Language Understanding
Oregon Graduate Institute
P.O. Box 91000
Portland, OR, 97291-1000

CONTACT INFORMATION

cole@cse.ogi.edu, voice 503-690-1159, fax 503-690-1306

WWW PAGE

http://www.cse.ogi.edu/CSLU/

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Speech recognition, spoken dialogue systems, portable systems

PROJECT SUMMARY

NSF grants to the Center for Spoken Language Understanding (CSLU) at OGI support basic research in human language technology leading to rapid prototyping and evaluation of spoken dialogue systems for real world applications. To produce spoken dialogue systems that are robust, accurate and graceful, we are performing research in the following areas:

Environmental robustness. RASTA processing of speech is a proven technique for minimizing effects of channel variation. Research continues on RASTA processing and using RASTA features in neural networks.

Phonetic recognition. More accurate estimation of phoneme probabilities is a major goal at CSLU. Proposed efforts focus on more precise context-dependent modeling, better training techniques for neural networks and explicitly modeling the dynamics of speech. All these efforts provide a needed research alternative to HMMs, which are already very well studied.

Vocabulary-independent recognition. Rapid prototyping of spoken dialogue systems depends critically on vocabulary-independent recognition, in which the system recognizes words or phrases for which no training data are available. Vocabulary independent recognition requires research leading to the automatic generation of expected word pronunciations that model phonological variation, dialects, accents and errors produced by the phonetic recognizer.

Rejection of extraneous speech. Rejection of out of vocabulary words and background sounds is a key problem that must be solved for robust and graceful systems. Improved metrics must be devised for for rejecting out-of-vocabulary utterances.

Speaker adaptation. Research is proposed so that a system improves its recognition performance while it is being used by a speaker (on-line adaptation), and after use by many speakers. It is important for systems to improve performance for difficult speakers (ones not encountered when training classifiers) without degrading performance for speakers for whom acceptable performance is achieved. Adaptation must occur at several levels. For example, the system may need to adapt to the speaker's vocal tract characteristics by modifying the signal representation, to the speaker's accent by modifying the pronunciation models of words. In addition, the system should adapt to speakers off-line (automatically) by training on speech data collected during use, and by identifying common responses to prompts, and modify the recognition vocabulary and language models to reflect frequency of use.

PROJECT REFERENCES

Hermansky, H. and N. Morgan, "RASTA processing of speech," IEEE Transactions on Speech and Acoustics, 2(4): 587-589, Oct., 1994.

Barnard, E., R. A. Cole, M. Fanty and P. Vermeulen, "Real-world speech recognition with neural networks," In Applications and Science of Artificial Neural Networks, pp. 524-537, Orlando, Fla., April 1995. SPIE, Vol. 2492.

Cole, R. A., D.G. Novick, M. Fanty, S. Sutton, B. Hansen and D. Burnett, "Rapid prototyping of spoken language systems: The year 2000 census project," in Proceedings of the International Symposium on Spoken Dialogue, Tokyo, Japan, Nov. 1993.

Hu, Z., E. Barnard and R.A. Cole, "Transition-based feature extraction within frame-based recognition," in Proceedings of Eurospeech 1995, pp. 1555-1558, Madrid, Spain, Sep. 1995.

AREA BACKGROUND

CSLU is a multidisciplinary laboratory for research in human language technology, with 5 faculty, 12 research and technical staff, and 15 students. Our mission is "to perform basic research leading to advances in the state of the art of spoken language systems." Our mission statement reflects our commitment both to basic research and system development, and our belief that these activities should be integrated, so that research advances are evaluated in the context of systems used in real applications. This "market-based" approach to research forces us to solve problems that cause spoken language systems to fail, and promotes technology transfer. Basic research is performed by faculty and students, while system development and technology transfer activities are performed by research and technical staff.

The main research areas at CSLU include:

Spoken dialogue systems, including research in speech recognition, natural language understanding and dialogue modeling;
Speaker recognition;
Automatic language identification;
Development of human language resources, including (a) the development and distribution of speech corpora in 22 languages, and (b) the development of a toolkit to allow rapid prototyping and evaluation of spoken dialogue systems, and research on the underlying technology.

Description of research and development activities in these areas, and references to work in these areas can be obtained via the CSLU Web site: http://www.cse.ogi.edu/CSLU/ .