Repurposing relational databases for approximate probabilistic inference

March 16, 2017
Halligan 102
Speaker: Wolfgang Gatterbauer, Carnegie Mellon University
Host: Rob Jacob


Performing inference over large uncertain data sets has become a central problem in data management with applications in related fields, such as machine learning, AI and statistics. With recent probabilistic knowledge bases having millions to billions of uncertain tuples, there is a pressing need for general-purpose and scalable solutions. Since general reasoning under uncertainty is highly intractable, many state-of-the-art systems perform approximate inference with sampling today.

This talk presents an alternative approach that uses only basic operators of relational database management systems (i.e., no sampling required). The first part of the talk develops optimal oblivious bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new probabilities. The second part then applies these bounds to reduce the problem of approximate inference over probabilistic databases to a standard multi-query evaluation problem. We will see experimental evidence that this approach can be orders of magnitude faster and also more accurate than sampling-based approaches for ranking top query answers.

Bio: Wolfgang Gatterbauer is an Assistant Professor in the Tepper School of Business at Carnegie Mellon University, and by courtesy in the Computer Science Department of Carnegie Mellon University. He received his PhD in Computer Science from Vienna University of Technology and did a Post-Doc with Dan Suciu at University of Washington. Wolfgang's work focuses on ways to extend the capabilities of modern data management systems to support new forms of data, in particular uncertain data. He is the recipient of a CAREER award from the National Science Foundation and a "best-of-conference" mention from VLDB 2015.