Beckman Institute and Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
405 N. Mathews Ave., Urbana, IL 61801
Email: yxz@ifp.uiuc.edu
Phone: (217) 333--2012
Fax: (217) 244--8371
In implementing the plan for speaker adaptation, an initial effort will be concentrated on the study of the convergence property of the proposed feedback self-learning mechanism under various recognition conditions. The study will be conducted from the experimental aspect of recognition accuracy improvement and the theoretical aspect of hidden Markov model convergence. The next effort is focused on a detailed study on the identification and modeling of individual attributes of speaker characteristics that cause inter-speaker speech variations, so as to effectively overcome these variations through speaker adaptation. In the last phase of the research, the adaptation algorithm will be fully integrated into one or more SICSR systems and computation efficiency will be improved towards real-time adaptive SICSR systems.
In carrying out the plan for environment adaptation, a beginning effort will be made on the study of two-channel configurations and the resulting signal characteristics; parameter estimation for speech, interference, and time-varying channel characteristics will also be implemented and studied. A subsequent effort will be focused on combining recognition classification and signal parameter estimation, in order to effectively enhance speech spectral features or adapt speech models. In the final phase of the research, the adaptation algorithm will be integrated with an SICSR system and the system will be evaluated in certain human-computer interaction environments at the Beckman Institute of UIUC, such as the 3D problem solving environment for structural biology, or the virtual reality environment CAVE, where the ASR techniques will facilitate applications that require hands-free, eyes-free natural interactions between people and the environments.
During the award period, the PI also plans to initiate education activities on speech processing and recognition, which include offering new courses, setting up a laboratory for research and teaching, supervising graduate students' thesis work, participating in multidisciplinary research, and establishing collaborative relations with industry. Both the research and education plans are currently under full implementation; some of the results are reflected in the project references as listed below. It is expected that a successful execution of the project plans will be instrumental to the related research and education programs in the prestigious academic institution of UIUC.
K.Yen and Y.Zhao, "Co-Channel Speech Separation and Recognition based on Blockwise Decorrelation and Filtering," in preparation.
Y.Zhao, "Robust Speaker Characterization," to appear in Proc.of IEEE 1995 Workshop on Automatic Speech Recognition, Snowbird, Utah, Dec. 1995 (invited).
Y.Zhao, "Hierarchical Mixture Densities and Phonological Rules in Open Vocabulary Speech Recognition," Proc. of the 4th EuroSpeech, pp. 1587--1590, Madrid, Spain, Sept., 1995.
Y.Zhao, T.H.Applebaum, and B.A.Hanson, "Acoustic Normalization of Microphone-Channel Characteristics," Proc. of ICA, Trondheim, Norway, June, 1995 (invited).
Y.Zhao, "Iterative Self-Learning Speaker Adaptation Under Various Initial Conditions," Proc.ICASSP, pp. 712--715, Detroit, MI, May, 1995.
Y.Zhao, "An Acoustic-Phonetic Based Speaker-Adaptation Technique for Improving Speaker-Independent Continuous Speech Recognition," IEEE Trans. on Speech and Audio Processing, Vol. 2, No. 3, pp. 380--394, July, 1994.
Y.Zhao, "Self-learning Speaker Adaptation Based on Spectral Variation Source Decomposition," Proc.of the 3rd EuroSpeech, pp. 359--362, Berlin, Germany, Sept., 1993.
Y.Zhao, "A Speaker-Independent Continuous Speech Recognition System Using Continuous Mixture Gaussian Density HMM of Phoneme-sized Units," IEEE Trans.on Speech and Audio Processing, Vol. 1, No. 3, pp. 345--361, July, 1993.
Speaker adaptation is a technique for speaker-robustness that makes use of a new speaker's speech data to reduce mismatches between the trained models (acoustic and phonologic) and the new speaker's characteristics, which potentially would improve the recognition accuracy of an ASR system to that of a system specifically trained for the speaker. Many approaches of speaker adaptation have been proposed, which can be categorized as supervised vs.unsupervised adaptations, off-line vs.on-line adaptations, data or model transformations vs.model adaptation, data specific vs.relational adaptations, etc. A brief overview of the adaptation techniques can be found in [3]. The efforts on environment-robustness have been focused on stationary noises with single-channel speech processing; the approaches include noise filtering, noise-robust distance measure, model adaptation, model augmentation, etc.[4,5,6].
[2].R.Cole et al., "The Challenge of Spoken Language Systems: Research Directions for the Nineties," IEEE Trans.on Speech and Audio Processing, Vol.3, No. 1, pp.1--21, Jan.1995.
[3].Y.Zhao, "Robust Speaker Characterization," to appear in Proc.of IEEE 1995 Workshop on Automatic Speech Recognition, Snowbird, Utah, Dec. 1995.
[4].A.Nadas, D.Nahamoo, and M.A.Picheny, "Speech Recognition using Noise-Adaptive Prototypes," IEEE Trans. on ASSP, Vol. 37, pp. 1495--1503, Oct. 1989.
[5].B.-H.Juang, "Speech Recognition in Adverse Environments," Computer Speech and Language, pp. 275--294, 5, 1991.
[6].R.C.Rose, E.M.Hofstetter, and D.A.Reynolds, "Integrated Models of Signal and Background with Application to Speaker Identification in Noise," IEEE Trans.on Speech and Audio Processing, Vol.2, No.2, pp.245--258, Apr.1994.
3. Other Communication Modalities.
4. Adaptive Human Interfaces.
5. Usability and User-Centered Design.
6. Intelligent Interactive Systems for Persons with Disabilities.
3.Human-to-human communications commonly involve simultaneous usages of speech, hand-gesture, facial expression, etc. Research projects on various modalities of human-computer interaction should therefore address the between-modality interactions, in particular the interactions between spoken language with other modalities, which could in turn lead to new techniques in individual modalities.
4.Human spoken language adaptation to a problem solving environment is complementary to other human interactive behaviors. Research results in this area can be used for the integration of dynamic models of speech and language into ASR systems.
5.Successful ASR applications have been largely dependent upon judicious considerations of human factors. Research projects should include joint studies of human factors with interface design and ASR-technique improvement.
6.ASR technology has wide potentials for persons with disabilities. The number one chronic disability in America is hearing impairment, where ASR can provide improved means of speech processing to enhance both the intelligibility and quality of speech for the design of hearing aids.