MODELING SPEECH PRODUCTION:
FORMANT AND ARTICULATORY SPEECH SYNTHESIS

Donald G. Childers

Mind-Machine Interaction Research Center
University of Florida
Department of Electrical and Computer Engineering
405 CSE, Bldg. 42
P.O. Box 116130
Gainesville, FL 32611-6130

CONTACT INFORMATION

Telephone (904) 392-2633

Fax (904) 392-0044

E-mail: childers@drwho.ee.ufl.edu

WWW PAGE

http://www.eel.ufl.edu/~childers

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Speech synthesis, speech analysis, speech quality, vocal fold modeling, articulatory speech synthesis, formant speech synthesis

PROJECT SUMMARY

Present speech analysis and synthesis methods use models that are often poorly correlated with the mechanisms of human speech production. To help alleviate this problem our research is directed toward adding a new dimension to measuring and modeling articulatory movement. We model laryngeal function and vocal tract characteristics using parameters extracted from the acoustic signal that have proven to be significant for synthesizing aspects of vocal quality. Our purpose is to develop a model of phonatory and resonance characteristics of speech production that will be interactive. The researcher will be able to test hypotheses about vocal quality by "varying" acoustic, anatomical, and physiological parameters of the model. The various features of the model may be validated by evaluating speech that is synthesized using the characteristics of the articulatory model. The advantages of this approach are that it is non-invasive and provides a dynamic model of glottal and vocal tract characteristics that are related to physiological, anatomical, acoustical, and perceptual aspects of voice, and furthermore, the model may be evaluated and validated using speech synthesis techniques.

Some illustrative examples of results include the ability to relate aspects of vocal fold mass and length, aperiodicity of vocal fold motion, and glottal area to vocal quality factors. A unique application of the results is a possible training aid for the hearing impaired. Another potential long range application would be to speech coding that would be based on segmenting speech according to phonetically related intervals that are synchronized with articulatory movement. This could provide a low bandwidth, high quality speech coding scheme.

PROJECT REFERENCES

Childers, D.G. and Wu, K., Quality of Speech Produced by Analysis-Synthesis, Speech Communication, vol. 9, no. 1, 1990, pp. 97-117.

Childers, D.G. and Ding, C., Articulatory Synthesis: Nasal Sounds and Male and Female Voices, Journal of Phonetics, vol. 19, 1991, pp. 453-464.

Wu, K. and Childers, D.G., Gender Recognition From Speech. Part I: Coarse Analysis, J. Acoust. Soc. Am., vol. 90, October, 1991, pp. 1828-1840.

Childers, D.G. and Wu, K., Gender Recognition From Speech. Part II: Fine Analysis, J. Acoust. Soc. Am., vol. 90, October, 1991, pp. 1841-1856.

Childers, D.G. and Lee, C-K., Vocal Quality Factors: Analysis, Synthesis, and Perception, J. Acoust. Soc. Am., vol. 90, November, 1991, pp. 2394-2410.

Lalwani, A.L. and Childers, D.G., Modeling Vocal Disorders Via Formant Synthesis, ICASSP, 1991, pp. 505-508.

Lalwani, A.L. and Childers, D.G., A Flexible Formant Synthesizer, ICASSP, 1991, pp. 777-780.

Prado, P.P.L., Shiva, E.H., and Childers, D.G., Optimization of Acoustic-to-Articulatory Mapping, ICASSP, 1992, pp II-33 to II-36.

Childers, D.G. and Bae, K.S., Detection of Laryngeal Function Using Speech and Electroglottographic Data, IEEE Transactions on Biomedical Engineering, vol. 39, no. 1, January, 1992, pp. 19-25

Childers, D. G., Signal Processing Methods for the Assessment of Vocal Disorders, Medical and Life Sciences Engineering, India (Special Issue on Biomedical Signal Processing), 1994.

Childers, D. G. and Wong, C. F., Measuring and Modeling Vocal Source-Tract Interaction, IEEE Trans. Biomed. Engr., vol. 41, June, 1994, pp. 663-671.

Childers, D. G. and Hu, H. T., Speech Synthesis by Glottal Linear Prediction, J. Acoust. Soc. Am., vol. 96. October, 1994, pp. 2026-2036.

Childers, D. G. and Ahn, C., Modeling the Glottal Volume-velocity Waveform for Three Voice Types, J. Acoust. Soc. Am., vol. 97, January, 1995, pp. 505-519.

Childers, D. G., Principe, J. C., and Ting, Y. T., Adaptive WRLS-VFF for Speech Analysis, IEEE Trans. Speech and Audio Processing, vol. 3, May, 1995, pp. 209-213.

Childers, D. G., Glottal Source Modeling for Voice Conversion, Speech Communication, vol. 16, 1995, pp. 127-138.

AREA BACKGROUND

Our research deals with the development of models of speech production. As such we deal primarily with speech synthesis and speech analysis. Several long range goals of the research are to model various voice types, including breathy, hoarse, harsh, and vocal fry voices. We are also modeling voice disorders brought about by functional and organic disorders. Within this framework, one goal is to create new voices, which one might call a Mel Blanc voice synthesizer. Mel Blanc was the vocal artistic talent behind such voices as Porky Pig, Daffy Duck, Bugs Bunny, and others. The process of voice creation (modeling) entails understanding the vocal features that correlate with a speaker's age, gender, emotional state, dialect, and health. In addition this research provides insight to the development of quantitative measures to assess voice quality, voice normalization, and to assist in understanding subtle aspects of speaker dependent and independent speech recognition. Our research is also directed at methods for modeling laryngeal (vocal fold) function to determine the significance of its role in voice quality. To accomplish these goals we are developing three speech synthesizers that have interactive, graphics users interfaces.

These are linear predictive coding, formant, and articulatory speech synthesis. The latter adjusts models of the position of the tongue, lips, jaw, velum, and other features to mimic the words produced by humans. We use these synthesizers to test hypotheses concerning human speech production. The experimental data for our studies is contained in a data base we have collected over the years from patients with vocal disorders, individuals with various voice types, and male and female voices. This data base is constantly undergoing extensive analysis to measure features considered important for voice modeling. To accomplish this task we have developed an interactive speech analysis software system to measure source-tract interaction, intervals of voiced and unvoiced sounds, automatic and user assisted inverse filtering, measurement of glottal volume velocity waveforms, as well as other speech related features.

AREA REFERENCES

P. B. Denes and E. N. Pinson, The Speech Chain, 2nd Edition, W. H. Freeman, 1993 (Paperback).

I. H. Witten, Making Computers Talk, Prentice-Hall, 1986 (Paperback).

L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978.

J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals, Macmillian, 1993.

RELATED PROGRAM AREAS

4. Adaptive Human Interfaces. 6. Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

None at this time.

MODELING SPEECH PRODUCTION: FORMANT AND ARTICULATORY SPEECH SYNTHESIS