Computational Geometry

Department of Computer Science

LMS Regression in O(n^{2}) Time Using Guided Topological Sweep

The * Least Median of Squares (LMS) Regression Line* of a point
set is the line that **minimizes the median squared residual**. As a
comparison, the
more familiar ordinary least squares (OLS) regression line minimizes the
sum of the squared residual.
The advantage of the LMS line over the OLS line is that at least 50% of
the data would need to be corrupt in order to skew the line (while for
the OLS line, a single corrupt data point can give the resulting
regression line an arbitrarily large slope).

We present an implementation of the guided topological sweep algorithm that computes the Least median of square regression line of a point set, based on the algorithm presented in [3].

The code deals with degeneracies in the data set and with multiple identical points. The implementation is based on minor modifications to this code.

Figure

The green lines determine the smallest width slab that enclose 50% of the data points. The LMS regression line is the line exactly in the middle of the slab

Papers

[1] "

Topological Sweep in Degenerate cases", E. Rafalin, D. Souvaine, I. Streinu,ALENEX 02, January 2002, San-Francisco, CA,Postscript version, Pdf version

[2] "Topologically Sweeping an arrangement",H.Edelsbrunner, L. Guibas,J. of Computer and System Sciences 38, 165-194, 1989

[3] "Computing Median-of-Squares Regression Lines and Guided Topological Sweep ",H.Edelsbrunner, D. SouvaineJournal of the American Statistical Association, 85, 1990, 115-119

Code

A README file describing how to use the code.

A tar.zip version of the code - NEW - supports vertical slabs!

Related sites