DEFORMABLE TEMPLATES FOR FACE DESCRIPTION,
RECOGNITION, INTERPRETATION, AND LEARNING

A.L. Yuille

DAS, Harvard University

CONTACT INFORMATION

Prof. A.L. Yuille, G12e Pierce Hall, Division of Applied Sciences, 29 Oxford Street, Cambridge, MA 02138. U.S.A.

(617)495-9526
yuille@hrl.harvard.edu
FAX (617) 496-6404

Note: address will be changed after January 1 to: Smith-Kettlewell Eye Research Institute, 2232 Webster Street, San Francisco, CA 94115. (415) 561-1620 (Smith-Kettlewell)
yuille@skivs.ski.org

WWW PAGE

The Harvard WWW page is at http://hrl.harvard.edu but it has not been updated because I am leaving soon. I will arrange a WWW page at Smith-Kettlewell when I arrive.

PROGRAM AREA

Other Communication Modalities

KEYWORDS

Face modelling, recognition, and understanding

PROJECT SUMMARY

The goal of our work is to recognize and interpret faces from visual stimuli. The intensity images of faces will be complicated functions of the geometry of the face, its reflectance properties, and the lighting sources. Our approach is based on the use of deformable template models. These consist of face models which have explicit deformations corresponding to: (i) lighting variations, and (ii) geometrical variations due to changes in expression and viewpoint. In this framework recognizing a face corresponds to showing that a given image region can be well matched to a deformable template model by adjusting the deformation parameters. The face can then be described by the resulting values of these parameters.

We have pursued two approaches to this problem. The first is based on two-dimensional face templates. The second attempts to learn and recognize three-dimensional models.

The two-dimensional face template work is summarized in the PhD thesis of Peter Hallinan. It consists of using a linear model for lighting variation (P.W. Hallinan. ``A low-dimensional representation of human faces for arbitrary lighting conditions''. In Proc. CVPR. 1994) to model changes in illumination. In addition it uses two-dimensional spatial warps of the images to model differences between different faces and small changes in viewing direction or expression. The resulting system (P.W. Hallinan. PhD. DAS, Harvard. 1995), subject to these restrictions, is able to recognize faces under extreme changes of lighting conditions, to distinguish between faces and non-faces, and to detect faces in an image.

This work has been extended by: (i) showing that the linear lighting model works for a range of objects (R. Epstein, P.W. Hallinan, and A.L. Yuille ``Five plus or minus two eigenimages suffic''. In Proc. IEEE Workshop on Physics-Based Modelling. 1995.), (ii) understanding under what assumptions the two-dimensional spatial image warps correspond to three-dimensional shape changes. (A.L. Yuille, M. Ferraro, and T. Zhang. ``Shape from Warping''. In preparation. 1995).

More recently, we have attempted to improve on this work by using three-dimensional object models. These models are learnt from the data assuming only a set of images of the object taken under identical viewpoint but different lighting conditions. We have shown that Lambertian lighting models are sufficiently accurate for the face provided they are applied robustly so that cast shadows and specularities can be treated as residuals. If prior knowledge of faces is used then these three-dimensional models may be learnt from a single image. Once these three-dimensional models are learnt then recognition can proceed by finding the specific face and lighting conditions that best synthesize the image. (P.N. Belhumeur, A.L. Yuille, and R. Epstein. ``Learning and Recognizing Objects using Illumination Subspaces.'' In preparation. 1995).

PROJECT REFERENCES

P.W. Hallinan. ``A low-dimensional representation of human faces for arbitrary lighting conditions''. In Proc. CVPR. 1994.

P.W. Hallinan. PhD. DAS, Harvard. 1995.

R. Epstein, P.W. Hallinan, and A.L. Yuille ``Five plus or minus two eigenimages suffice''. In Proc. IEEE Workshop on Physics-Based Modelling. 1995.

A.L. Yuille, M. Ferraro, and T. Zhang. ``Shape from Warping''. In preparation. 1995

P.N. Belhumeur, A.L. Yuille, and R. Epstein. ``Learning and Recognizing Objects using Illumination Subspaces.'' In preparation. 1995.

AREA BACKGROUND

The goal of computer vision is to recognize objects, determine their shape and position and to describe and understand the visual scene. This turns out to be a suprisingly difficult task and, in fact, at least fifty percent of the human cortex is devoted to it. In recent years theoretical advances and the increased power of computers have made it practical to begin using computer vision systems.

AREA REFERENCES

Robot Vision. B.K.P. Horn. MIT Press, Cambridge, MA. 1986.

Active Vision. Eds. A. Blake and A.L. Yuille. MIT Press, Cambridge, MA. 1992.

RELATED PROGRAM AREAS:

Speech and Natural Language Understanding.

Adaptive Human Interfaces.

Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

Our work could be combined with lip reading to yield an interaction with Speech and Natural Language Understanding. Facial recognition and expression interpretation is potentially highly important for human computer interfaces, which gives a link to Adaptive Human Interfaces. Finally, face recognition and expression interpretation systems would be highly useful to blind people, for Intelligent Interactive Systems for Persons with Disabilities.