Data Systems for Hybrid Analytics: Tuning, Optimizing, and Selecting the Right Way to Access Data
Science and business create an increasing amount of digital data every day; the challenge to store and analyze such ever growing data collections is further amplified by increasing application complexity, and technological shifts. In particular, analytical workflows need to support fast ingestion, and fast read queries while data size is growing; in other words, data systems need to support increasingly heterogeneous workloads, that weave read-mostly analytical operations and update-heavy transactional operations, into hybrid transactional/analytical processing workloads. In this talk I present a path towards building data systems for this new set of workloads, starting from their access methods. As the design goals typically entail to reduce read latency, update latency, and space utilization, I attack the problem by studying the trade-offs between these design goals, and I propose solutions that balance the trade-offs through modeling and optimization. I use this methodology to build access methods for modern key-value stores, basic column storage, and to present how to perform access path selection in light of the changes in workloads and hardware. I conclude with abstracting the common concepts of access method designs, and propose a path for building data systems for hybrid analytics in the future.
Manos Athanassoulis is a postdoctoral researcher at DASlab, the Data Systems Laboratory at Harvard University, working on designing data systems and access methods for hybrid workloads and new hardware. Manos received his PhD from EPFL, Switzerland, and his undergraduate and Masters’ degree from the University of Athens, Greece. He has also been with IBM Watson Research Labs. For his work on access method design at Harvard Manos won a "Best of ACM SIGMOD 2017" award, and an "ACM Reproducibility award" at SIGMOD 2016. Manos is also the recipient of a Postdoc Mobility Fellowship from the Swiss National Science Foundation, a "Best of VLDB" selection for 2010, and an IBM Ph.D. Fellowship.