Tufts University Logo
Computational Geometry
Department of Computer Science
Home
Research Topics
People Papers & Presentations Courses Software Geometry Links
Statistical Data Depth at Tufts

Statisticians have recently developed the notion of data depth for non-parametric multivariate data analysis (see for example [1], [2]). This new concept provides center-outward orderings of points in Euclidean space of any dimension and leads to a new non-parametric multivariate statistical analysis in which no distributional assumptions are needed. 

A data depth measures how deep (or central) a given point x in Rd is relative to F, a probability distribution in Rd (assuming {X1,.., Xn } is a random sample from F) or relative to a given data cloud.  

Some examples of data depth are Halfspace Depth [6], Simplicial Depth [2], the Convex Hull Peeling Depth [7] and Regression depth [5] (which is the depth of a hyperplane relative to a set of points). All of these depths are affine invariant: each depth value remains the same after the data are transformed by any affine transformation. Different notions of data depth capture different statistical characteristics of the underlying distribution.

Depth contours, constructed by enclosing all points of depth d or higher, are especially powerful for visualizing and quantifying data. Simple (in many cases 2D) graphs can be used to visualize these parameters for the data set. The potential is enormous for analysis of massive data sets in such areas as quality control and aviation safety analysis, clinical data mining, biological imaging analysis, and statistical process control.

Data Depth Sub Pages (This will be enhanced)
Figure

Halfspace depth contours for a data set that consists of 50 points, drawn from a bivariate normal distribution with mean (0,0) and covariance 4 times identity. (rousseeuw.eps)

Papers and Presentations

[1]"Multivariate analysis by data depth: descriptive statistics, graphics and inference ", Liu, R. The Annals of Statistics (27) 783-858,1999

[2]"On a notion of data depth based on random simplices", Liu, R. The Annals of Statistics (18) 405-414,1990

[3]" Efficient Computation of Location Depth Contours by Methods of Combinatorial Geometry", K. Miller, S. Ramaswami, P. Rousseeuw, T. Sellares, D. Souvaine, I. Streinu, A. Struyf. Statistics and Computing, 2003, Postscript version.

[4] "Fast implementation of depth contours using topological sweep,'' K. Miller, S. Ramaswami, P. Rousseeuw, T. Sellares, D. Souvaine, I. Streinu, A. Struyf. Proceedings of the Twelfth ACM-SIAM Symposium on Discrete Algorithms, Washington, DC, January, 2001. Postscript version

[5]"Regression depth", Rousseeuw, P. J. and M. Hubert,
J. Amer. Statist. Assoc. (94),388-433, 1999

[6] "Mathematics and the picturing of data", Tukey, John W., Proceedings of the International Congress of Mathematicians, Vancouver, B. C., 1974, Vol. 2, 523--531

[7]"Convex hull peeling", Eddy, W in "COMPSTAT", 42-47, 1982

[8]" Computational Geometry and Statistical Depth Measures" , E. Rafalin, D. Souvaine, Theory and Applications of Recent Robust Methods, edited by M. Hubert, G. Pison, A. Struyf and S. Van Aelst, 2004 in Series: Statistics for Industry and Technology, Birkhauser, Basel. Postscript version Pdf version

[9]" Computational Geometry, Data Depth and Robust Statistics" , E. Rafalin, D. Souvaine, Interface 2004, Baltimore, MD. Postscript version Pdf version

[*] A powerpoint presentation describing the code developed in the department for analysis based on the notion of data depth (presented in a DIMACS workshop on Data Depth, May 2003)

[*] A powerpoint presentation surveying the connection between computational geometry and depth based statistics (presented in The International Conference on Robust Statistics, ICORS 03, Antwerp, Belgium, July 2003)