Streaming Data Analytics

December 13, 2018
3:00 PM
Halligan 102
Speaker: Erieta Liarou, Vertica
Host: Remco Chang

Abstract

Data analytics are everywhere. From social media to scientific data management, and any data driven startup, applications need to minimize data-to-insight time. As new data arrives quickly, applications need to instantly recognize patterns of interest as well as combine these insights with past data. However, state-of-the-art data systems can only do one or the other, i.e., they can only work over small amounts of streaming data to perform continuous queries or only work over large amounts of previously collected data to perform analytics. We will present DataCell, a new system that can achieve top performance on both scenarios, eliminating the need to maintain two different data systems. At its core, DataCell builds on top of state-of-the-art analytics architectures, i.e., column-oriented systems, and shows how to transform such systems such that modern applications can natively do analytics and stream processing in a single platform.

Bio

Erietta Liarou received her Ph.D. in Computer Science from the University of Amsterdam in 2013. Her primary research interests include database architectures, transaction processing on modern hardware, stream processing, distributed query processing, and in-situ data analytics. She is currently a senior software engineer at Vertica Micro Focus. In the past, she has been with the Data Intensive Applications and Systems (DIAS) lab at EPFL, the Database Architectures group at CWI, the Data Systems Laboratory at Harvard SEAS, the System S group at IBM T. J. Watson Research Center and the Intelligent Systems Laboratory at Technical University of Crete.