Learning-based Data Management Systems in the Big Data Era
Abstract
As big-data sets and computing infrastructures available today
continue to grow in volume and diversity, the complexity of data
management systems inevitably increases. At the same time our ability
to comprehend and leverage this abundance of data and computing
resources effectively remains as limited as before. Nevertheless,
most of today’s data-driven applications and services still require
humans to make decisions on system-orchestration: both at the
front-end (e.g., to formulate exploratory queries) as well as the
back-end (e.g., to provision resources and distribute workloads).
Unfortunately, such decisions are often ad-hoc or based on
“rules-of-thumb”, thus often failing to achieve the promise of today’s
technology innovations.
In this talk, I argue for a substantial shift away from human-crafted
solutions and towards systems that leverage data-science tools to gain
insight and automate data-driven systems. Towards this vision, I will
describe two learning-based data management services: (a) WiSeDB, a
cost management advisor that relies on supervised and reinforcement
learning to guide workload management actions for cloud databases, and
(b) AIDE, an interactive data exploration service that builds on
active learning to automatically steer users towards interesting data
areas. Both systems demonstrate how machine learning can lead to
highly versatile data management systems that automatically adapt to
user preferences, converge to performance expectations and tolerate
unexpected shifts in resource availability.
Speaker Bio: Olga Papaemmanouil is an Assistant Professor in the Department of Computer Science at Brandeis University since 2009. She received a undergraduate degree in Computer Engineering and Informatics at the University of Patras, Greece, in 1999, a Sc.M. in Information Systems at the University of Economics and Business, in Athens, Greece, in 2002, and a Ph.D in Computer Science at Brown University, in 2008. Her research interest lies in the area of data management with a recent focus on big data analytics, cloud databases, data exploration, query optimization and query performance prediction. She is the recipient of an NSF Career Award (2013), an ACM SIGMOD Best Demonstration Award and a Paris Kanellakis Fellowship. She serves in the Program Committee of major database conferences such as SIGMOD, VLDB, ICDE and EDBT.