Metrics and Scheduling Algorithms for Next Generation Data Stream

April 4, 2007
2:50 pm - 4:00 pm
Halligan 111B

Abstract

Data streams processing is an emerging research area that is driven by the growing need for monitoring applications. Efficient employment of monitoring applications requires advanced data processing techniques that can support the continuous processing of unbounded rapid data streams. This need has led to a new data processing paradigm and created a new generation of data processing systems that support Continuous Queries on data streams. Primary emphasis in the development of first generation Data Stream Management Systems (DSMS) was given to basic functionality. However, in order for DSMSs to reach the maturity of DBMSs, greater attention should be paid to performance and user requirements.

In this talk, I will propose several quality of service (QoS) and quality of data (QoD) metrics for quantifying the performance of a DSMS when used to support a wide range of business or scientific applications. I will also present novel algorithms for scheduling multiple continuous queries that aim to improve the performance of the DSMS with respect to the proposed metrics. Further, I will discuss how these scheduling algorithms can be efficiently implemented and extended to exploit particular characteristics of continuous queries. Finally, I will present the results of extensive experimental study, with real and synthetic data, which illustrates that the proposed algorithms consistently outperform the existing state of the art.

Bio: ====

Mohamed Sharaf is currently a Ph.D. candidate in the Department of Computer Science at the University of Pittsburgh. He received his B.Sc. in 1997 and his M.Sc. in 2000, both in Computer Engineering from Cairo University. In 2004, he received his M.Sc. in Computer Science from the University of Pittsburgh. He was the recipient of the Taulbee Award for Excellence in Computer Science in 2002 and a two-time winner of the Andrew Mellon Predoctoral Fellowship. Mohamed's research interests lie in the general area of Data Management Systems with a focus on developing scalable data processing techniques that support time-sensitive, data-intensive, pervasive applications. His research has addressed several topics in data management including data stream management systems, sensor data processing, mobile and pervasive data management, data warehousing, and Web databases.