Department of Electrical Engineering
San Jose State University
San Jose, CA 95192
The model is based on the physical effects of sound propagation, and on neurophysiological studies that have traced the auditory pathways from the cochlea to the auditory cortex. It extends existing computational models of the cochlea by incorporating monaural and binaural, temporally-based correlation methods to extract the information needed for source localization. If successful, this work should significantly improve the abilities of computers to recognize speech or other sounds as they occur in everyday, multisource environments, thereby extending the range of effective human-machine interaction.
R. O. Duda, "Binaural hearing demonstrations," Acta Acustica (in press).
W. Chau and R. O. Duda, "Combined monaural and binaural localization of sound sources," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, November, 1995).
R. O. Duda, "Connectionist models for auditory scene analysis," in J. D. Cowan, G. Tesauro and J. Alspector, Eds., Advances in Neural Information Processing Systems -6-, pp. 1069-1076 (Morgan Kaufmann, San Francisco, 1994).
C. Lim and R. O. Duda, "Estimating the azimuth and elevation of a sound source from the output of a cochlear model," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).
T. Shawan and R. O. Duda, "Adjacent-channel inhibition in acoustic onset detection," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).
R. O. Duda, "Modeling head related transfer functions," Proc. Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 457-461 (Asilomar, CA, October, 1994).
The basis for sound localization comes from three scientific areas: acoustics, auditory neurophysiology, and psychoacoustics. The physical cues for localizing sources are captured by the so-called head-related transfer function, which measures the directional dependence of the diffraction of incident sound waves by the torso, head and outer ears. Studies of the neural pathways from the cochlea to the auditory cortex provide inspiration for both the structure of a localization model and the kinds of signal processing that are appropriate, helping to define parameters such as filter bandwidths, response times, and compressive nonlinearities to cope with dynamic range. Studies in psychoacoustics reveal the different kinds of cues that humans use to localize sources, and human abilities to deal with echoes and reverberation. Our effort involves synthesizing this information in computational models of sound localization.
J. Blauert, Spatial Hearing (MIT Press, Cambridge, MA, 1983). The standard reference on the psychophysics of three-dimensional hearing.
A. S. Bregman, Auditory Scene Analysis (MIT Press, Cambridge, MA, 1990). A massive description of experiments by the author and his students on the factors that influence the formation and segregation of sound streams.
J. C. Middlebrooks and D. M. Green, "Sound localization by human listeners," Annu. Rev. Psychol., Vol. 42, pp. 135-159 (1991). An excellent review of the abilities of people to localize sound. Highly recommended.
J. O. Pickles, An Introduction to the Physiology of Hearing (2nd Ed.) (Academic Press, London, 1988). Describes in detail most of what is known of the neurophysiological processes along the path from the cochlea to the auditory cortex.
W. A. Yost, "Perceptual Models for Auditory localization," Proc. AES 12th Int. Conf. (Copenhagen, Denmark, June 1993). An excellent survey of prior modeling work.
Intelligent Interactive Systems for Persons with Disabilities
Virtual Environments.
2. Sound localization for the hearing impaired. By feeding the outputs of a real-time sound localizer to appropriate tactile or visual displays, one could provide the hearing impaired with information about objects and events that are out of sight.
3. Customized, signal-processing models of the head-related transfer function promise relatively inexpensive and very effective ways to synthesize compelling 3D sound for auditory displays and VR.