A COMPUTATIONAL MODEL FOR SOUND LOCALIZATION

Richard O. Duda

Department of Electrical Engineering
San Jose State University
San Jose, CA 95192

CONTACT INFORMATION

e-mail: duda@best.com
phone: (408) 924-3917
fax: (408) 924-3925

WWW PAGE

http://www-engr.sjsu.edu/electeng/faculty/duda/

PROGRAM AREA

3. Other Communication Modalities.

KEYWORDS

Sound localization, 3D sound, spatial hearing, sound separation, auditory scene analysis, head-related transfer functions

PROJECT SUMMARY

The goal of our research is to create a model of the process by which people locate sounds in three dimensions. Emphasis is placed on explaining well established but still incompletely understood psycho-acoustical phenomena. One example is our ability to locate sounds coming from above or below, despite the fact that there are no binaural differences between the sounds that reach the two ears. Another is our ability to locate sounds in reverberant environments that contain multiple sound sources, where echoes and reflections act as additional, virtual sources. A third is our ability to judge distance, since loudness alone is not an adequate cue.

The model is based on the physical effects of sound propagation, and on neurophysiological studies that have traced the auditory pathways from the cochlea to the auditory cortex. It extends existing computational models of the cochlea by incorporating monaural and binaural, temporally-based correlation methods to extract the information needed for source localization. If successful, this work should significantly improve the abilities of computers to recognize speech or other sounds as they occur in everyday, multisource environments, thereby extending the range of effective human-machine interaction.

PROJECT REFERENCES

R. O. Duda, "Estimating azimuth and elevation from the interaural head-related transfer function," in R. H. Gilkey and T. B. Anderson, Eds., Binaural and Spatial Hearing (Lawrence Erlbaum Associates, Hillsdale, NJ, in press).

R. O. Duda, "Binaural hearing demonstrations," Acta Acustica (in press).

W. Chau and R. O. Duda, "Combined monaural and binaural localization of sound sources," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, November, 1995).

R. O. Duda, "Connectionist models for auditory scene analysis," in J. D. Cowan, G. Tesauro and J. Alspector, Eds., Advances in Neural Information Processing Systems -6-, pp. 1069-1076 (Morgan Kaufmann, San Francisco, 1994).

C. Lim and R. O. Duda, "Estimating the azimuth and elevation of a sound source from the output of a cochlear model," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).

T. Shawan and R. O. Duda, "Adjacent-channel inhibition in acoustic onset detection," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).

R. O. Duda, "Modeling head related transfer functions," Proc. Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 457-461 (Asilomar, CA, October, 1994).

AREA BACKGROUND

Sound localization is part of what is called auditory scene analysis -- the decomposition of sound into source components and the characterization of the acoustic environment. On the input side, the goal is to allow computers to cope with speech and other sounds that are encountered in everyday, multi-source, reverberant environments. On the output side, the goal is to provide effective ways to synthesize realistic spatial sounds.

The basis for sound localization comes from three scientific areas: acoustics, auditory neurophysiology, and psychoacoustics. The physical cues for localizing sources are captured by the so-called head-related transfer function, which measures the directional dependence of the diffraction of incident sound waves by the torso, head and outer ears. Studies of the neural pathways from the cochlea to the auditory cortex provide inspiration for both the structure of a localization model and the kinds of signal processing that are appropriate, helping to define parameters such as filter bandwidths, response times, and compressive nonlinearities to cope with dynamic range. Studies in psychoacoustics reveal the different kinds of cues that humans use to localize sources, and human abilities to deal with echoes and reverberation. Our effort involves synthesizing this information in computational models of sound localization.

AREA REFERENCES

D. Begault, 3-D Sound for Virtual Reality and Multimedia (Academic Press, Boston, MA, 1994). An elementary but very clear presentation of 3-D audio principles and current technology.

J. Blauert, Spatial Hearing (MIT Press, Cambridge, MA, 1983). The standard reference on the psychophysics of three-dimensional hearing.

A. S. Bregman, Auditory Scene Analysis (MIT Press, Cambridge, MA, 1990). A massive description of experiments by the author and his students on the factors that influence the formation and segregation of sound streams.

J. C. Middlebrooks and D. M. Green, "Sound localization by human listeners," Annu. Rev. Psychol., Vol. 42, pp. 135-159 (1991). An excellent review of the abilities of people to localize sound. Highly recommended.

J. O. Pickles, An Introduction to the Physiology of Hearing (2nd Ed.) (Academic Press, London, 1988). Describes in detail most of what is known of the neurophysiological processes along the path from the cochlea to the auditory cortex.

W. A. Yost, "Perceptual Models for Auditory localization," Proc. AES 12th Int. Conf. (Copenhagen, Denmark, June 1993). An excellent survey of prior modeling work.

RELATED PROGRAM AREAS

Speech and Natural Language Understanding

Intelligent Interactive Systems for Persons with Disabilities

Virtual Environments.

POTENTIAL RELATED PROJECTS

1. Speech recognition in natural environments. Early work by Weintraub and more recent work by Bodden and Blauert suggests that the techniques of auditory scene analysis, though computationally expensive, have the potential to cope with interfering sound for which a random-noise model is inadequate.

2. Sound localization for the hearing impaired. By feeding the outputs of a real-time sound localizer to appropriate tactile or visual displays, one could provide the hearing impaired with information about objects and events that are out of sight.

3. Customized, signal-processing models of the head-related transfer function promise relatively inexpensive and very effective ways to synthesize compelling 3D sound for auditory displays and VR.