Research Talk: Clustering with Multiple Group Constraints

April 6, 2011
Halligan 111B
Speaker: Jingjing Liu, Tufts University


Semi-supervised/constrained clustering groups objects into clusters while accommodating the additional information to enforce a desirable clustering result. Often we have access to more than one aspect of domain knowledge. For example, to cluster a pile of apples, one may want to separate big ones and small ones, as well as to separate green ones and red ones.

In this talk, I will start from introducing previous works on clustering with pair-wise constraints: penalized probabilistic clustering (PPC) and Class-level PPC (PPC) Class-Level PPC(CPPC). Then I will discuss how to combine multiple sets of group constraints in clustering. I will also present a novel algorithm to conduct non- redundant clustering by applying CPPC.

Constrained clustering is a relatively new area in machine learning, there are various unsolved questions and potential applications. In the end, I will present the problems arose from this area that interest me as future research directions.