Gaussian Kernel Width Exploration and Cone Cluster Labeling for Support Vector Clustering

November 28, 2007
2:50 pm - 4:00 pm
Halligan 111

Abstract

The process of clustering groups together data points so that intra-cluster similarity is maximized while inter-cluster similarity is minimized. Clustering has applications in fields such as data mining and bioinformatics. Support Vector Clustering (SVC) is a clustering approach, based on Support Vector Machine concepts from machine learning, that can identify arbitrarily shaped cluster boundaries. The execution time of SVC depends heavily on several factors: choice of the width of a kernel function that determines a nonlinear transformation of the input data, solution of a quadratic program, and the way that the output of the quadratic program is used to produce clusters. This work builds on our prior SVC research in two ways. First, we propose a method for identifying a kernel width value in a region where our experiments suggest that clustering structure is changing significantly. This can form the starting point for efficient exploration of the space of kernel width values. Second, we offer a technique, called Cone Cluster Labeling, that uses the output of the quadratic program to build clusters in a novel way that avoids an important deficiency present in previous methods. Our experimental results use both two-dimensional and high-dimensional data sets.

This is joint work with Sei-Hyung Lee.