Technical Reports
TR-2006-2
Describing Multivariate Distributions with Nonlinear Variation Using Data Depth
|
Authors: |
Izem, Rima; Rafalin, Eynat; Souvaine, Diane L. |
Date: | March 2006 |
Pages: | 20 |
Download Formats: |
[PDF] |
Abstract:
Researchers in a large number of fields, including biology,
medicine, chemistry, and engineering, collect multivariate data such as
samples of curves, images, or shapes of complex multivariate distributions.
There is an increasing need to describe the distributions of this high
dimensional multivariate data in a meaningful and concise way. Data depth
methods provide a concise, robust, nonparametric way to describe
multivariate distributions which is also useful for inference. However,
classical data depth definitions and methods are ill-suited to describe
these distributions when the space of variation of the data is non-convex or
nonlinear.
This project presents novel data depth functions that would extend data
depth concepts to describe variation of multivariate data when the space of
variation is a manifold or the result of nonlinear variation in the data.
This method will make it possible to quantify statistical properties of the
probability distribution useful for dimensionality reduction, variability
decomposition and inference. This research will progress in several
directions. The first stage will focus on characterization of the class of
distributions for which our definition coincides with other classical
definitions of data depth such as half-space depth, and simplicial depth.
Practical methods of estimating this depth function based on proximity
graphs will be developed in the second stage. The algorithms will use
existing methods for manifold learning and representation of
high-dimensional manifold. The third stage will focus on creating
statistical analysis and quantification methods, specifically methods
detecting outliers or skewness in a sample, for dimensionality reduction,
for description of variation, and for inference. As part of the research we
will quantify the required sampling conditions for the estimators to
approximate the underlying distributions and suggest additional methods that
will improve the behavior of the suggested estimators. To finalize this
research an experimental study will be conducted with varying constructions
of graphs on point sets and applied for analysis of real life data sets.
|
Faculty: for help posting a technical report please visit the
User Guide.