# Interesting subgraph mining by Markov Chain Monte Carlo

## Abstract

In This talk, I will first give an overview on graph mining and then
I will discuss about our recent work on exploring randomized
algorithms in graph mining domain to obtain a small sample of {\em
interesting} subgraph patterns. The sampling paradigm is useful for
mining from database of large graphs for which the traditional graph
mining algorithms do not scale well. We adopt Markov Chain Monte
Carlo (MCMC) sampling for this task that performs a random walk on
the subgraph partial order with locally constructed transition
probability matrix. The method is scalable as it visits only a small
part of the large combinatorial search space. By treating the
normalized value of the {*interesting-ness score*} of a
subgraph as the desired stationary distribution of this random walk,
we sample subgraphs that are {*interesting*} with respect to
the users' application need. For instance, for exploratory data
analysis, we propose uniform sampling of frequent subgraph pattern;
for frequent pattern summarization, we propose uniform sampling of
$k$ maximal frequent subgraphs; for graph classification, we perform
Metropolis-Hastings based sampling to mine discriminatory subgraphs,
etc. Another potential benefit of our sampling based paradigm is
that it is generic and is applicable to all different kinds of
patterns.

Bio: Mohammad Hasan is a PhD candidate, Computer Science at RPI. His advisor is Dr. Mohammed Zaki. He holds an MS from the University of Minnesota, Twin Cities and a BSc(Engg) from Bangladesh University of Engineering and Technology, both in Computer Science. His broad research interest is data mining, information retrieval, machine learning, social network analysis, and bioinformatics. He has published more than 15 papers in premier data mining, machine learning and bioinformatics journals and conferences. He served as a program committee Member for the SIGKDD 2009 conference.