Fast Cross-Request Analysis and Contention Discovery

May 5, 2023
11:00am EST
Cummings, #610
Speaker: Tomislav Zabcic-Matic - Quals Talk
Host: Raja Sambasivan

Abstract

Quals talk:

Resource contention in distributed systems is a source of many significant performance problems, and diagnosing contention in distributed systems presents a broad set of challenges which conventional distributed system observability frameworks lack the ability to properly address. Developers need the ability to quickly discover causes of contention when debugging performance problems arising from it, and the lack of an ability to observe application behaviors across concurrent requests hinders attempts to localize and diagnose contention-based performance problems. In this work, we address one prevalent type of resource contention - queue-based contention - which affects applications at all layers, from application code to the kernel and hardware. We present insights about localizing areas of contention across request workflows, and investigate the possibility of localizing and diagnosing contention through observability of only application-level logs. We then present a review of the design space for a database system which would assist in performing queries designed to help localize contention, and present our initial work on narrowing down a solution within that state space in order to enable fast online discovery of potentially contending sections of concurrent request workflows. We present a preliminary evaluation of our hypothesis via a distributed relational database which leverages indexing on key timing properties of distributed system logs in order to accelerate time-based filtering queries in a high-volume logging environment.

Research area: Distributed systems, observability