Techniques for Detecting, Avoiding, and Understanding Concurrency Errors

March 30, 2011


ABSTRACT: Programmers are the weak link in the chain of concurrent software development. People make mistakes that lead to subtle concurrency errors that are difficult to find and fix. These errors degrade software reliability, and can lead to costly system failures. Moving concurrent programming to the mainstream is dependent on a solution to the problems posed by concurrency errors. In this talk I will describe two different approaches to this problem.

First, I will discuss a technique for automatically isolating patterns of inter-thread shared-memory communication that is likely the cause of buggy program behavior. The key contribution of this work is a new graphical data-flow abstraction called a context-aware communication graph. We use statistical reasoning over sets of these graphs to determine the root cause of software failures. We develop a set of hardware extensions that enable graph collection with negligible performance overhead.

Second, I will discuss architectural support we have developed for automatically identifying and avoiding atomicity violations, a common class of concurrency errors. We leverage prior work on the relationship between atomicity and serializability analysis to find atomicity bugs. We encode our analysis in a set of architectural extensions to a computer system that monitor a concurrent execution. Upon finding a likely atomicity problem, these extensions use dynamic atomic regions to prevent error behavior. In addition, we describe a useful generalization of standard serializability theory that allows us to detect more complex errors involving accesses to multiple memory locations.

BIO: Brandon Lucia is a fourth year PhD student at the University of Washington, advised by Luis Ceze. Brandon's research focuses on two things: (1) detecting software errors automatically and relating them to programmers in a comprehensible way; and (2) developing computer systems that can automatically avoid the incorrect behavior that can result from software errors. The techniques developed in Brandon's work span the system stack, including support in the hardware and architecture, system/runtime level mechanisms, and system support to simplify the specification of modern programming languages. Brandon's continuing interest is in making the process of creating software -- especially concurrent software -- as simple as possible, even if achieving this goal requires some rethinking of the way computer systems are designed. Brandon has fond memories of his many hours spent in the depths of Halligan as an undergraduate, and hopes that computer science has, at last, reclaimed the second floor from the athletics department.