Technical Reports

Display by Author: A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
Search by for:
Describing Multivariate Distributions with Nonlinear Variation Using Data Depth
Authors: Izem, Rima; Rafalin, Eynat; Souvaine, Diane L.
Date:March 2006
Download Formats: [PDF]

Researchers in a large number of fields, including biology, medicine, chemistry, and engineering, collect multivariate data such as samples of curves, images, or shapes of complex multivariate distributions. There is an increasing need to describe the distributions of this high dimensional multivariate data in a meaningful and concise way. Data depth methods provide a concise, robust, nonparametric way to describe multivariate distributions which is also useful for inference. However, classical data depth definitions and methods are ill-suited to describe these distributions when the space of variation of the data is non-convex or nonlinear.

This project presents novel data depth functions that would extend data depth concepts to describe variation of multivariate data when the space of variation is a manifold or the result of nonlinear variation in the data. This method will make it possible to quantify statistical properties of the probability distribution useful for dimensionality reduction, variability decomposition and inference. This research will progress in several directions. The first stage will focus on characterization of the class of distributions for which our definition coincides with other classical definitions of data depth such as half-space depth, and simplicial depth. Practical methods of estimating this depth function based on proximity graphs will be developed in the second stage. The algorithms will use existing methods for manifold learning and representation of high-dimensional manifold. The third stage will focus on creating statistical analysis and quantification methods, specifically methods detecting outliers or skewness in a sample, for dimensionality reduction, for description of variation, and for inference. As part of the research we will quantify the required sampling conditions for the estimators to approximate the underlying distributions and suggest additional methods that will improve the behavior of the suggested estimators. To finalize this research an experimental study will be conducted with varying constructions of graphs on point sets and applied for analysis of real life data sets.

Faculty: for help posting a technical report please visit the User Guide.