Featherweight concurrency in a portable assembly language

Norman Ramsey and Simon Peyton Jones

What abstractions should a reusable code generator provide to make it easy for a language implementor to compile a highly concurrent language? The implementation of concurrency is typically tightly interwoven with the code generator and run-time system of the high-level language. Our contribution is to tease out the tricky low-level concurrency mechanisms and to package them in an elegant way, so they can be reused by many front ends.

Here, then, is the problem we address: we want to make it easy for a front end to support efficient user-level threads, in such a way that all the policy decisions are in the hands of the front end, while all the tricky mechanism is supported by C--. Like many fine phrases, this goal is easier to state than achieve. The contribution of this paper is a design that extends C-- with clean, architecture-independent mechanisms that can be used to implement common concurrency policies efficiently.

Full paper

The paper is available as US Letter PostScript (218K), US Letter PDF (238K), and US Letter TeX DVI (85K).

Revisions

Since this paper was written, our thinking about C-- and concurrency has evolved. Here are some notes:

The NewContinuation procedure in the run-time interface (from Section 3.2) has evolved into a CreateStack procedure. This procedure receives a delimited stack and returns a continuation and a ``stack-limit cookie.'' As of Fall 2005, we have implemented a slightly simpler CreateThread primitive without the limit cookie. This primitive is described in Section 11.2.1 of the C-- Specification.
The story about global variables is even murkier than it was in 2001. We have therefore backed off the whole story of Section 5 and are considering the problem of global-register variables to be still unsolved.
We've moved forward a bit on the limitcheck primitive described in Section 7. Our thinking has evolved toward a slightly simpler, less powerful primitive as described in Section 11.2.2 of the C-- Specification. This section also talks about moving stack frames and redirecting pointers.
We believe we need to extend the run-time interface beyond what's in this paper. The details remain to be worked out, but the idea is simple: while retaining an efficient representation in which most frames are allocated contiguously, the run-time interface should give the client the ability to treat a call stack as a linked list. For example, it should be possible to insert and delete frames, to copy frames, and to move frames from one memory location to another. Such an interface should support a wide variety of techniques commonly used in language implementations:
- Small, contiguous thread stacks that can grow
- Segmented stacks
- ``Stacklets''
- First-class Scheme continuations
- One-shot continuations (as in Chez Scheme)
- Incremental stack scanning for real-time garbage collection