Reducing Tail Latency Using Duplication inf a Cloud System

April 29, 2020
3:00
Zoom
Speaker: Hafiz Mohsin
Host: Fahad Dogar

Abstract

The performance of modern cloud applications (e.g., web search) is severely affected by the stragglers. Duplication is an effective strategy to deal with this problem, but it is often used conservatively because of the risk of overloading the system.

In this talk, I will discuss the challenges with the use of duplication that can lead to system overload and argue that these challenges stem from the lack of support for duplication in an end-to-end system. To this end, I will present duplicate aware scheduling or DAS, a simple duplication policy that provides duplication benefits without overloading the system. Further, I will talk about D-Stage, a duplication abstraction, that decouples the duplication policy from the mechanism and simplifies supporting DAS and many other duplication policies across diverse layers of a cloud system. Finally, I will go through some of the key results to show that DAS effectively deals with different types of stragglers in a public and private cloud settings and improves the performance of HDFS at the tail by up to 4.6X.

Zoom Link: https://tufts.zoom.us/j/98123050736

Meeting ID: 981 2305 0736

Please see colloquia email for password.

Video of the talk is now available
https://tufts.zoom.us/rec/share/9J02HpLysUpJfLPhxUTwVoQtEon1eaa82ygXq_cKzku9cakmFgM8iiQc0WyagbOP