Consensus Clustering and its Applications
Consensus clustering aims to find a single partition which agrees as much as possible with existing basic partitions, which emerges as a promising solution to find cluster structures from heterogeneous data. It has been widely recognized that consensus clustering is effective to generate robust clustering results, detect bizarre clusters, handle noise, outliers and sample variations, and integrate solutions from multiple distributed sources of data or attributes. Different from the traditional clustering methods, which directly conducts the data matrix, the input of consensus clustering is the set of various diverse basic partitions. Therefore, consensus clustering is a fusion problem in essence, rather than a traditional clustering problem. In this talk, I will introduce the category of consensus clustering, illustrate the K-means-based Consensus Clustering (KCC), which exactly transforms the consensus clustering problem into a (weighted) K-means clustering problem with theoretical supports, talk about some key impact factors of consensus clustering, extend KCC to Fuzzy C-means Consensus Clustering. Moreover, this talk also includes how to employ consensus clustering for heterogeneous, multi-view, incomplete and big data clustering. Derived from consensus clustering, a partition level constraint is proposed as the new side information for semi-supervised clustering. Along this line, several interesting application based on the partition level constraint, such as feature selection, domain adaptation, gene stratification are involved to demonstrate the extensibility of consensus clustering. Some codes are available for practical use.
Hongfu Liu is a final-year Ph.D. candidate of Department of Electrical & Computer Engineering, Northeastern University (NEU), supervised by Prof. Yun (Raymond) Fu. Before joining NEU, he got his master and bachelor degrees majored in management in Beihang University with Prof. Junjie Wu in 2011 and 2014, respectively. His research interests generally focus on data mining and machine learning, with special interests in ensemble learning.