Streaming Data Analytics
Abstract
Data analytics are everywhere. From social media to scientific data management, and any data driven startup, applications need to minimize data-to-insight time. As new data arrives quickly, applications need to instantly recognize patterns of interest as well as combine these insights with past data. However, state-of-the-art data systems can only do one or the other, i.e., they can only work over small amounts of streaming data to perform continuous queries or only work over large amounts of previously collected data to perform analytics. We will present DataCell, a new system that can achieve top performance on both scenarios, eliminating the need to maintain two different data systems. At its core, DataCell builds on top of state-of-the-art analytics architectures, i.e., column-oriented systems, and shows how to transform such systems such that modern applications can natively do analytics and stream processing in a single platform.