# Interesting subgraph mining by Markov Chain Monte Carlo

April 17, 2009
10:00am - 11:00 am
Halligan 106
In This talk, I will first give an overview on graph mining and then I will discuss about our recent work on exploring randomized algorithms in graph mining domain to obtain a small sample of {\em interesting} subgraph patterns. The sampling paradigm is useful for mining from database of large graphs for which the traditional graph mining algorithms do not scale well. We adopt Markov Chain Monte Carlo (MCMC) sampling for this task that performs a random walk on the subgraph partial order with locally constructed transition probability matrix. The method is scalable as it visits only a small part of the large combinatorial search space. By treating the normalized value of the {interesting-ness score} of a subgraph as the desired stationary distribution of this random walk, we sample subgraphs that are {interesting} with respect to the users' application need. For instance, for exploratory data analysis, we propose uniform sampling of frequent subgraph pattern; for frequent pattern summarization, we propose uniform sampling of $k$ maximal frequent subgraphs; for graph classification, we perform Metropolis-Hastings based sampling to mine discriminatory subgraphs, etc. Another potential benefit of our sampling based paradigm is that it is generic and is applicable to all different kinds of patterns.