Center for Spoken Language Understanding
Oregon Graduate Institute
P.O. Box 91000
Portland, OR, 97291-1000
Environmental robustness. RASTA processing of speech is a proven technique for minimizing effects of channel variation. Research continues on RASTA processing and using RASTA features in neural networks.
Phonetic recognition. More accurate estimation of phoneme probabilities is a major goal at CSLU. Proposed efforts focus on more precise context-dependent modeling, better training techniques for neural networks and explicitly modeling the dynamics of speech. All these efforts provide a needed research alternative to HMMs, which are already very well studied.
Vocabulary-independent recognition. Rapid prototyping of spoken dialogue systems depends critically on vocabulary-independent recognition, in which the system recognizes words or phrases for which no training data are available. Vocabulary independent recognition requires research leading to the automatic generation of expected word pronunciations that model phonological variation, dialects, accents and errors produced by the phonetic recognizer.
Rejection of extraneous speech. Rejection of out of vocabulary words and background sounds is a key problem that must be solved for robust and graceful systems. Improved metrics must be devised for for rejecting out-of-vocabulary utterances.
Speaker adaptation. Research is proposed so that a system improves its recognition performance while it is being used by a speaker (on-line adaptation), and after use by many speakers. It is important for systems to improve performance for difficult speakers (ones not encountered when training classifiers) without degrading performance for speakers for whom acceptable performance is achieved. Adaptation must occur at several levels. For example, the system may need to adapt to the speaker's vocal tract characteristics by modifying the signal representation, to the speaker's accent by modifying the pronunciation models of words. In addition, the system should adapt to speakers off-line (automatically) by training on speech data collected during use, and by identifying common responses to prompts, and modify the recognition vocabulary and language models to reflect frequency of use.
Barnard, E., R. A. Cole, M. Fanty and P. Vermeulen, "Real-world speech recognition with neural networks," In Applications and Science of Artificial Neural Networks, pp. 524-537, Orlando, Fla., April 1995. SPIE, Vol. 2492.
Cole, R. A., D.G. Novick, M. Fanty, S. Sutton, B. Hansen and D. Burnett, "Rapid prototyping of spoken language systems: The year 2000 census project," in Proceedings of the International Symposium on Spoken Dialogue, Tokyo, Japan, Nov. 1993.
Hu, Z., E. Barnard and R.A. Cole, "Transition-based feature extraction within frame-based recognition," in Proceedings of Eurospeech 1995, pp. 1555-1558, Madrid, Spain, Sep. 1995.
The main research areas at CSLU include: