Debugging Everywhere

The goal of the Debugging Everywhere project is to make debugging a cheap, ubiquitous service. We intend to begin by getting compilers to emit Active Debugging Information, which we expect will support multi-language, multi-platform debugging much more readily than older approaches like Dwarf or dbx ``stabs.''

Project Summary

Many future applications will run on heterogeneous networks and will be composed of components written in different programming languages. Unfortunately, debugging technology is not ready to support such applications. Today's most widely used debuggers work with single-language programs running on single machine platforms. We will soon need debuggers that work seamlessly with multiple platforms and multiple languages, and to which we can add support for new platforms and new languages at very low cost.

The current, standard approach to multi-language debugging makes it hard to achieve this goal. In this approach, debugging formats describe source-level data using a ``union'' model, which is intended to describe all types in all languages. Union models are complex, and it is hard for them to cover unanticipated features of new languages. Because a union model is high-level, the debugger must know how to map it to each target machine. Adding a new machine is expensive, because it must support a large, complex model.

To solve these problems, we plan a radical change in the way compilers support debugging. Instead of presenting the debugger with facts about the program, forcing it to figure out how the program maps to the target machine, provide the debugger directly with the capabilities it needs to do its job. In particular, instead of giving the debugger information about the program, the compiler can give the debugger code that it can run to probe the program.

This idea can work because the debugger's needs are few. The set of operations a debugger performs is small, and it is independent of source languages and target machines. These operations suffice for source-level debugging:

Given these operations, the debugger can support multiple programming languages without being be aware of what features are supported in what language.

We can provide low-cost debugging by having the compiler, not the debugger, provide implementations of these critical operations. For example, the compiler can supply a symbol-table object with a ``lookup symbol'' method. The symbols can include methods to print declarations, to print values, and to reconstruct the compiler's private representations of symbols. At debug time, the debugger can ask the compiler to evaluate expressions and assignments. The compiler need only be modified to request unknown symbols from the debugger---the debugger can reconstruct the compiler's private representations, which the compiler can use to translate an expression or assignment into something that the debugger can run. Reusing the compiler is much cheaper than implementing a language-dependent interpreter in the debugger, and it ensures that the compiler and debugger implement the same semantics.

This method of debugging support requires active debugging information---information with executable content. This content may be represented as code in a simple interpreted language. Using executable content, a debugger could support a new programming language at nearly zero cost---perhaps without changing the debugger at all. It should also be be possible for a debugger to support a new hardware platform at small, constant cost---not cost proportional to the number of languages or data types supported.

As a proof of concept, we have built a prototype retargetable debugger based on active debugging information. The prototype, ldb, can debug ANSI C programs on MIPS, SPARC, VAX, and Motorola 68000 platforms. This multi-platform support comes at low cost: about 500-600 lines of code per machine, versus an average of over 2,800 lines to retarget gdb. Research under this project will investigate using active debugging information to support multiple programming languages, also at low cost.

We will also investigate the costs of providing executable content: the programming cost required to modify the compiler, the added compile time required to emit executable content, and the increased size of object code required to hold executable content. All of these costs depend on the representation of executable content, which will be a focus of the investigation. For example, greater programming effort may be required to support smaller, faster representations. Representations that fit typical compiler organizations may reduce programming effort and/or speed compilation. An acceptable representation will have costs comparable to standard methods; an excellent representation will be cheap enough to include debugging information in all executables. Ubiquitous debugging is especially valuable in a distributed context, where bugs may be difficult to reproduce.

The investigation should result in new techniques that compiler writers can use to support debugging. These techniques will work in a multi-language, multi-platform context, at costs that make debugging attractive to both the compiler writer and the end user.

This project is supported by the National Science Foundation, grant number EIA-9974967.


Back to Norman Ramsey's home page.