Learning-based Data Management Systems in the Big Data Era

April 6, 2017
Halligan 102
Speaker: Olga Papaemmanouil, Brandeis University
Host: Robert Jacob


As big-data sets and computing infrastructures available today continue to grow in volume and diversity, the complexity of data management systems inevitably increases. At the same time our ability to comprehend and leverage this abundance of data and computing resources effectively remains as limited as before. Nevertheless, most of today’s data-driven applications and services still require humans to make decisions on system-orchestration: both at the front-end (e.g., to formulate exploratory queries) as well as the back-end (e.g., to provision resources and distribute workloads). Unfortunately, such decisions are often ad-hoc or based on “rules-of-thumb”, thus often failing to achieve the promise of today’s technology innovations.

In this talk, I argue for a substantial shift away from human-crafted solutions and towards systems that leverage data-science tools to gain insight and automate data-driven systems. Towards this vision, I will describe two learning-based data management services: (a) WiSeDB, a cost management advisor that relies on supervised and reinforcement learning to guide workload management actions for cloud databases, and (b) AIDE, an interactive data exploration service that builds on active learning to automatically steer users towards interesting data areas. Both systems demonstrate how machine learning can lead to highly versatile data management systems that automatically adapt to user preferences, converge to performance expectations and tolerate unexpected shifts in resource availability.

Speaker Bio: Olga Papaemmanouil is an Assistant Professor in the Department of Computer Science at Brandeis University since 2009. She received a undergraduate degree in Computer Engineering and Informatics at the University of Patras, Greece, in 1999, a Sc.M. in Information Systems at the University of Economics and Business, in Athens, Greece, in 2002, and a Ph.D in Computer Science at Brown University, in 2008. Her research interest lies in the area of data management with a recent focus on big data analytics, cloud databases, data exploration, query optimization and query performance prediction. She is the recipient of an NSF Career Award (2013), an ACM SIGMOD Best Demonstration Award and a Paris Kanellakis Fellowship. She serves in the Program Committee of major database conferences such as SIGMOD, VLDB, ICDE and EDBT.