Discovery of Latent Factors in High-dimensional Data Using Tensor Methods

March 7, 2016
noon - 1pm
Halligan 102
Speaker: Furong Huang, University of California, Irvine
Host: Roni Khardon

Abstract

Latent or hidden variable models have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. I will discuss my results on convergence to globally optimal solution for stochastic gradient descent, despite non-convexity of the objective. I will then discuss large-scale implementations (which are highly parallel and scalable) of spectral methods, carried out on CPU/GPU and Spark platforms. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods.

I will discuss the following applications in detail: (1) learning hidden user commonalities (communities) in social networks, and (2) learning sentence embeddings for paraphrase detection using convolutional models. More generally, I have applied the methods to a variety of problems such as text and social network analysis, healthcare analytics, and neuroscience.

Bio: Furong Huang is a 6th year Ph.D. Candidate from UC Irvine working with Professor Anima Anandkumar. Her research interests lie in developing scalable and parallel algorithms for large-scale data using statistical models. She has worked on non-convex function optimization such as finding tensor decomposition with global convergence guarantee using stochastic gradient descent; developing fast detection algorithm to discover hidden and overlapping user communities in social networks; designing a parallel spectral tensor decomposition algorithm for detecting hidden topics in articles on Map-Reduce frameworks; and learning convolutional sparse coding models using tensor methods for extracting text sequence embeddings and image filterbank. Beside pure statistical computation, Furong has applied her machine learning techniques to biology. She learned a hierarchy of human diseases using the latent tree graphical model along with the hierarchical tensor decomposition method.

Recently she worked on cataloging neuronal cell types and their gene expression profiles in mouse brain by extracting mixture of spatial point process on large-scale, cellular-resolution brain images, and the project started during her internship at Microsoft Research New England with Jennifer Chayes and Christian Borgs, along with Srinivas Turaga from Janelia Research Campus of Howard Hughes Medical Institute. Furong recently received the "MLconf Industry Impact Student Research Award" for graduate students whose work has the potential to disrupt the industry in the future.