A Preliminary Study: Parallel K-means Clustering Algorithm on NOWs

September 29, 1999

2:50 pm - 4:00 pm

Halligan 111

Speaker: Sanpawat Kantabutra, Ph.D. student in CS, Tufts University

Abstract

Despite its simplicity and its linear time, a serial K-means algorithm's time complexity remains expensive when it is applied to a problem of large size of multidimensional vectors. In this paper we show an improvement by a factor of O(K/2), where K is the number of desired clusters, by applying theories of parallel computing to the algorithm. In addition to time improvement, the parallel version of K- means algorithm also enables the algorithm to run on larger collective memory of multiple machines when the memory of a single machine is insufficient to solve a problem. We show algorithm to run on larger collective memory of multiple machines when the memory of a single machine is insufficient to solve a problem. We show that a problem size can be scaled upto O(K) times a problem size on a single machine.