Interesting subgraph mining by Markov Chain Monte Carlo

April 17, 2009
10:00am - 11:00 am
Halligan 106
Speaker: Mohammad Hassan
Host: Carla Brodley


In This talk, I will first give an overview on graph mining and then I will discuss about our recent work on exploring randomized algorithms in graph mining domain to obtain a small sample of {\em interesting} subgraph patterns. The sampling paradigm is useful for mining from database of large graphs for which the traditional graph mining algorithms do not scale well. We adopt Markov Chain Monte Carlo (MCMC) sampling for this task that performs a random walk on the subgraph partial order with locally constructed transition probability matrix. The method is scalable as it visits only a small part of the large combinatorial search space. By treating the normalized value of the {interesting-ness score} of a subgraph as the desired stationary distribution of this random walk, we sample subgraphs that are {interesting} with respect to the users' application need. For instance, for exploratory data analysis, we propose uniform sampling of frequent subgraph pattern; for frequent pattern summarization, we propose uniform sampling of $k$ maximal frequent subgraphs; for graph classification, we perform Metropolis-Hastings based sampling to mine discriminatory subgraphs, etc. Another potential benefit of our sampling based paradigm is that it is generic and is applicable to all different kinds of patterns.

Bio: Mohammad Hasan is a PhD candidate, Computer Science at RPI. His advisor is Dr. Mohammed Zaki. He holds an MS from the University of Minnesota, Twin Cities and a BSc(Engg) from Bangladesh University of Engineering and Technology, both in Computer Science. His broad research interest is data mining, information retrieval, machine learning, social network analysis, and bioinformatics. He has published more than 15 papers in premier data mining, machine learning and bioinformatics journals and conferences. He served as a program committee Member for the SIGKDD 2009 conference.