IMPROVED REAL-TIME AUDIO INTERFACE
FOR HUMAN-COMPUTER INTERACTION

Michael W. Hoffman

University of Nebraska-Lincoln
Department of Electrical Engineering
209N WSEC
Lincoln, NE 68588-0511

CONTACT INFORMATION

email: hoffman@unlinfo.unl.edu

Phone: 402 472-1979

FAX: 402 472-4732

mail: 209N WSEC, Lincoln, NE 68588-0511

WWW PAGE

http://www.engr.unl.edu/~hoffman/

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Beamforming, spatial filter, microphone array, speech enhancement, robust beamforming

PROJECT SUMMARY

A three year project is underway to provide an improved human-machine audio interface. One important component of the project research is the application of array processing techniques to improve the quality of the human speech presented to the machine for subsequent processing. Initially, results from current research into multiple microphone hearing aids will be extended to the similar acoustic problem of a multiple microphone acoustic beamformer for the human-machine interface. These results have demonstrated the effectiveness of a robust constrained adaptive processing technique in the presence of an imperfect array of microphones. Subsequent research efforts will be focussed on efficient frequency domain implementations of the robust processor, development of a robust hybrid fixed/adaptive array processor, and the incorporation of an acoustic echo-canceller to remove machine generated output signals from the acoustic input to the machine.

The research will examine potential solutions for a number of needs that exist for the audio interface between a human and a computer (or machine). These needs include reducing background noise and reverberation. The fact that machines also generate audio signals for the human means that the interface will need to remove this audio feedback from the input to the machine from the human. Finally, all of these needs must be satisfied in real-time to allow the human-machine interface to be of any practical use.

The results sought from the project include an enhanced speech input to a machine that will allow subsequent processing algorithms for word recognition, speaker identification, speech compression, etc., to be more effective. The approach suggested is completely general and can be used for any microphone configuration and for any type of acoustic interface between a human and a machine. The fact the array picks up signals remotely means that the user will not be connected to the machine by a cable. The approach suggested is "cellular" in that the processor can be modified to track a user who is moving within a room by determining the user position and changing the constraints that define the robust adaptive processor to preserve signals generated within the cell that contains the user's estimated position.

PROJECT REFERENCES

M. W. Hoffman "Microphone Array Calibration for Robust Adaptive Processing," In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Lake Mohonk, New Paltz, NY, Oct. 1995.

M. W. Hoffman and K. M. Buckley. "Robust time-domain processing of broadband acoustic data." IEEE Transactions on Speech and Audio Processing, Vol.3, pp. 193-203, May 1995.

M. W. Hoffman, T. Trine, K. M. Buckley, and D. J. Van Tasell. "Robust microphone array processing for hearing aids: Realistic speech enhancement predictions." Journal of the Acoustical Society of America, vol.96:759-771, Aug. 1994.

M. W. Hoffman. "Robust adaptive processing of microphone array data for hearing aids." In Proceedings of the IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Lake Mohonk, New Paltz, NY, Oct. 1993.

AREA BACKGROUND

Spatial filtering provides an ideal means of reducing reverberation and background noise for speech signals. Unlike single microphone techniques, spatial filtering provides the reduction of reverberation, competing speakers, and other noise based upon the spatial separation of the desired speaker and the interfering noise. Processing of data from arrays of sensors has long been applied in traditionally military problems such as SONAR, RADAR, and communications. Geophysicists have also used array processing to aid in underground exploration. When processed by a microphone array, sounds from the desired direction should be preserved while other sounds are attenuated. It should be pointed out that the type of interfering sound (e.g., competing speaker, fan noise, etc.) does not greatly influence the effectiveness of the spatial filter (or beamformer). For single microphone techniques this is not the case; many single channel techniques that have been shown to work well for special types of interfering noise are entirely inappropriate for use in the presence of a competing speaker.

The current project applies a spatial filter to provide clean acoustic signals to a machine (computer) in a noisy and reverberant room. A voice-controlled system's reliability depends upon a clear, uncorrupted speech signal as input to the automated speech processing system. Virtually all processing algorithms designed for speech signals work better when the input speech is not corrupted by interfering noise and distortion. Speech coders, word recognition systems, and speaker identification systems are often very sensitive to background noise and reverberation. Digitally processed signals from an array of microphones provide enhanced speech input for the machine's automated processing systems. In addition, the processing that improves the speech input quality will not place hardships on the human user the system, such as a microphone attached to the machine by a cable or a precise fixed location for the human. The project attempts to advance a sophisticated interface between humans and machines that places the burden of processing and inconvenience on the machine rather than the human user.

AREA REFERENCES

B.D. VanVeen and K.M. Buckley. "Beamforming: A versatile approach to spatial filtering". IEEE ASSP Magazine, pages 4--24, Apr. 1988.

H. Cox, R.M. Zeskind, and M.M. Owen. "Robust adaptive beamforming". IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35:1365--1376, Oct. 1987.

J.E. Greenberg and P.M. Zurek. "Evaluation of an adaptive beamforming method for hearing aids". Journal of the Acoustical Society of America, vol.91:1662--1676, Mar. 1992.

R.W. Stadler and W.M. Rabinowitz. "On the potential of fixed arrays for hearing aids". Journal of the Acoustical Society of America, vol.94:1332--1342, 1993.

RELATED PROGRAM AREAS

Clearly, those projects in area 2 are the most closely related. Other program areas with some overlap include area 3 (Other Communication Modalities), area 4 (Adaptive Human Interfaces), and area 6 (Intelligent Interactive Systems for Persons with Disabilities). The relationship to these other program areas involves the use of the spatial information that can be extracted or provided via an array of sensors/transducers.

POTENTIAL RELATED PROJECTS

The projects associated with speech processing within program area 2 would represent the most immediate and obvious areas for potential collaboration. The integration of the spatial processing and word recognition processing may yield substantial improvements in the performance of the word recognition system. This is especially true in difficult acoustic environments.

Other programs areas that are attempting to ease the burden that human-computer interactions place on the human may be able to exploit user information (such as position in a room, movement, etc.) to better anticipate the needs of the user. The microphone array interface should be able to provide some good cues as to user position, user movements and histories of such movements. Arrays of sensors allow two separate functions: signal enhancement (i.e., beamforming) and source localization (i.e., direction finding). While the primary emphasis of the current project is signal enhancement, some emphasis could be placed on exploiting the localization capacities of the sensor array.

IMPROVED REAL-TIME AUDIO INTERFACE FOR HUMAN-COMPUTER INTERACTION