Norman Ramsey - Research Activities

This page sketches my research activities and shows how to get the papers that have the details. Information is organized by topic as far as possible, although some work could reasonably appear under more than one topic. A chronological list of papers is also available.

Topics covered on this page include:

Retargetability

Machine Code

C--, a Portable Assembly Language

Binary Translation

Debugging

Language Design

Interpreters

Literate Programming

Spider - generating language-dependent tools

Noweb - a simple, language-independent tool

Formal Methods

Miscellany

Concurrency

Error management in compilers

Unparsing

Retargetability

I am intrigued by the problems of retargeting machine-dependent software to different hardware platforms. This topic is of especial interest to those who create such software, since it is hard for improvements in debuggers, linkers, etc. to have an impact unless the software works on multiple platforms. My contributions include a retargetable debugger and a toolkit for recognizing and generating machine instructions. A research summary is available.

Machine code

Representations of machine instructions

ldb's breakpoint implementations required a rudimentary facility for recognizing machine instructions. Attempts to generalize that facility led to the New Jersey Machine-Code Toolkit, which uses a new formalism to describe binary encodings. In one sentence, it gives application writers the ability to manipulate true binary representations of machine code while programming at an assembly-language level of abstraction. There are two papers describing the main ideas:

The New Jersey Machine-Code Toolkit. Proceedings of the 1995 USENIX Technical Conference, New Orleans, LA, January 1995, pp 289-302 (with Mary F. Fernández). [Abstract] [HTML version].
An overview, and the place to begin.
Specifying Representations of Machine Instructions. ACM Transactions on Programming Languages and Systems, 19(3):492-524, May 1997 (with Mary F. Fernández). [Abstract]
This is a comprehensive presentation of SLED, the toolkit's specification language. It also includes a little bit of information on what it's like to use the toolkit and how the toolkit is implemented. You can see an [Abstract] and an HTML version.

Those who actually want to use the toolkit will probably find the following technical reports useful:

New Jersey Machine-Code Toolkit architecture specifications. Technical Report TR-470-94, Department of Computer Science, Princeton University, October 1994 (with Mary F. Fernández).
These annotated descriptions of the MIPS, SPARC, and Intel Pentium show the toolkit's specification language in action. Highly recommended for anyone planning to write machine descriptions for the toolkit.
New Jersey Machine-Code Toolkit reference manual. Technical Report TR-471-94, Department of Computer Science, Princeton University, October 1994 (with Mary F. Fernández).
The manual describes the complete specification language and explains how to use version 0.2.

The toolkit project has also led to a machine-independent method of relocating instructions, described in

Relocating machine instructions by currying. ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, 226--236, May 1996. [Abstract]

The toolkit's equation solver is described in

A simple solver for linear equations containing nonlinear operators. Software---Practice & Experience, 26(4):467--487, April 1996. An earlier version appeared as Technical Report 95-068, Purdue University, Dept of Computer Sciences, November 1995.

The method we use to test toolkit specifications is described in

Automatic Checking of Instruction Specifications. 1997 International Conference on Software Engineering, pp 326-336, May 1997 (with Mary F. Fernández).

My research summary for retargeting describes the toolkit. The toolkit home page provides entrée to the software and its documentation.

Semantics of machine instructions

I've developed Lambda-RTL, a new metalanguage for specifying the semantics of machine instructions. You can read about goals and some preliminary results in

Machine Descriptions to Build Tools for Embedded Systems (with Jack W. Davidson). In ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), volume 1474 of LNCS, pages 172--188. Springer Verlag, June 1998.

The semantic descriptions are based on register transfers. Unlike other tools, Lambda-RTL produces fully explicit register transfers, in which the sizes of all operands and results are explicit, as well as the byte order used for all fetches and stores. The Lambda-RTL translator uses type inference with subtyping to add this information to specifications written in a simple metalanguage. Nascent applications include

The Zephyr project, which is part of the National Compiler Infrastructure. (Joint work with Jack Davidson.)
The University of Queensland Binary Translators. (Joint work with Cristina Cifuentes.)
The Quick C-- compiler. (Joint work with Simon Peyton Jones and Christian Lindig.)

More information is available on the Zephyr Lambda-RTL pages.

C--, a Portable Assembly Language

Compiler-writers often use C as a portable assembly language, so they can avoid the pain of writing multiple optimizing back ends. For this purpose, C's outstanding virtue is its ubiquity. But C was designed as a high-level programming language, and as a compiler-target language it has many deficiencies. In particular, it lacks support for tail calls, exception handling, garbage collection, and lightweight concurrency. I am therefore working with Simon Peyton Jones on the design of C--, a language that is designed as a compiler-target language. My particular interest is in compile-time and run-time support for garbage collection, exceptions, and debugging.

Simon and I have written three papers about C--, but we're still not satisfied. Please send feedback!

Machine-independent support for garbage collection, debugging, exception handling, and concurrency (draft). Technical Report CS-98-19, Department of Computer Science, University of Virginia, August 1998 (with Simon L. Peyton Jones).
An extensive proposal for extending C-- to support high-level run-time services. Parts are superseded by the next two papers.
C--: a portable assembly language that supports garbage collection. Invited paper, International Conference on Principles and Practice of Declarative Programming, LNCS 1702, pp 1-28, September 1999 (with Simon L. Peyton Jones and Fermin Reig).
A refinement of some of the ideas in the large technical report. Explains what C-- is and how it may be used to support such high-level run-time services as garbage collection, exception handling, debugging, profiling and concurrency. Only garbage collection is handled in detail.
An extensive proposal for extending C-- to support high-level run-time services. Parts are superseded by the exceptions paper, next.
A Single Intermediate Language That Supports Multiple Implementations of Exceptions (with Simon L. Peyton Jones). To appear in PLDI 2000.
A revised and simplified treatment of exceptions within the context of C--. This supersedes parts of the early proposal, but it is much more narrowly focused. One significant improvement is a formal operational semantics for much of C--.

Christian Lindig and I are buildng the Quick C-- implementation.

Binary Translation

I'm working with Cristina Cifuentes on a retargetable platform for binary translation. Cristina has put together a nice overview page. This work is supported by the Australian Research Council. We have written two papers:

A transformational approach to binary translation of delayed branches (with Cristina Cifuentes). Submitted to Science of Computer Programming, February 1999.
How the translator copes with delayed-branch instructions.
The design of a resourceable and retargetable binary translator. In Working Conference on Reverse Engineering (WCRE'99), October 1999 (with Cristina Cifuentes and Mike van Emmerik).
A sketch of the system's design.

Debugging

The goal of the Debugging Everywhere project is to make debugging a cheap, ubiquitous service. The idea behind the project is to use Active Debugging Information, i.e., debugging symbols containing executable content. Rather than have the compiler give the debugger a bunch of source-level information about the program, forcing the debugger to figure out what it means on a particular machine, we have the compiler provide the debugger with code it can execute directly to do things like evaluate expressions and print values. This approach will have two benefits:

Considerable simplification in the debugger, making it nearly independent of the source programming language and target machine.
Simplification for the compiler writer. Because the debugging information need not be represented in a limited standard like ``stabs'' or DWARF, the debugging information can be represented in whatever form is convenient for the compiler.

The project is based on earlier work on ldb, a retargetable debugger. I've written two papers and a thesis:

A retargetable debugger. ACM SIGPLAN '92 Conference on Programming Language Design and Implementation,in SIGPLAN Notices, 27(7):22-31, July 1992 (with David R. Hanson). [Abstract]
An overview of a work in progress. It has little explanation of expression evaluation or of breakpoints, since they weren't implemented yet.
A Retargetable Debugger. PhD thesis, Princeton University, Department of Computer Science, December 1992. [Abstract]
The definitive reference, but embarrassingly prolix.
Correctness of trap-based breakpoint implementations. In Proceedings of the 21st ACM Symposium on the Principles of Programming Languages, pages 15-24, Portland, Oregon, January 1994. [Abstract]
This literate program formalizes ldb's breakpoint implementation for the MIPS and SPARC, which uses strategically placed trap instructions. The paper also shows an apparently correct implementation that could actually miss breakpoint events.

My research summary for retargeting describes ldb.

Interpreters

There's nothing like a quick interpreter to try out a new language idea. I am especially interested in embedding interpreters into functional languages. I have written Lua-ML, which is an implementation of the Lua scripting language in Objective Caml. It is very useful for scripting and configuring Caml programs; we use it to configure our compiler for C--.

I have written two papers on Lua-ML:

Embedding an Interpreted Language Using Higher-Order Functions and Types. In ACM SIGPLAN 2003 Workshop on Interpreters, Virtual Machines and Emulators, pages 6-14, June 2003.
Homage to Olivier Danvy: How to embed an application-specific function into an interpreter simply by describing its type. Helpful in understanding Lua-ML.
ML Module Mania: A Type-Safe, Separately Compiled, Extensible Interpreter. To appear in the ACM SIGPLAN ML Workshop, September 2005.
A functional pearl that explains Lua-ML's extension mechanism: separately compiled libraries. For modules hackers everywhere. .

Language Design

Most of my work is in implementation, but I am also interested in language design.

Stochastic Lambda Calculus and Monads of Probability Distributions (with Avi Pfeffer). Conference Record of the 29th Annual ACM Symposium on Principles of Programming Languages (POPL'02), in SIGPLAN Notices 37(1):154--165.
Semantic foundations for probabilistic languages.
Toward a Calculus of Signatures.
This is work in progress not yet submitted anywhere. I'm scratching an itch I've had for years; what I want in an ML-like language is not objects but a better calculus of signatures. Kathleen Fisher and I are trying to refine this paper into a sensible proposal.

Literate Programming

Literate programming is the act of writing programs primarily as documents to be read by human beings, and only secondarily as instructions to be executed by computer. The term was coined by Don Knuth to describe WEB, the tool he created to write TeX. My contributions to literate programming have included analyzing tools, building tools to support multiple programming languages, and simplifying tools by recognizing and eliminating the inessential. I have also published some literate programs:

Correctness of trap-based breakpoint implementations. In Proceedings of the 21st ACM Symposium on the Principles of Programming Languages, pages 15--24, Portland, OR, January 1994.
A simple solver for linear equations containing nonlinear operators. Software---Practice & Experience, 26(4):467--487, April 1996. (An earlier version appeared as Technical Report 95-068, Purdue University, Dept of Computer Sciences, November 1995.)
Eliminating spurious error messages using exceptions, polymorphism, and higher-order functions, Computer Journal, 42(5):360-372, 1999.

Spider - Generating WEBs for different languages

In 1986, you couldn't write a literate program in a new programming language without first writing a new tool. Incompatible, language-dependent versions of WEB were starting to proliferate. I created Spider, a WEB generator, which can create a WEB system for any language whose lexical and syntactic structure are roughly Algol-like. I described Spider in a ``Literate Programming'' column for CACM, and the system is documented in two Princeton technical reports:

Literate programming: Weaving a language-independent WEB. Communications of the ACM, 32(9):1051-1055, September 1989.
A Spider user's guide. Technical Report TR-225-89, Department of Computer Science, Princeton University, August 1989.
The Spidery WEB system of structured documentation. Technical Report TR-226-89, Department of Computer Science, Princeton University, August 1989.

Spidery WEB, like the other members of the WEB family, is too complex (see paper below), so I discourage readers from digging up the manuals or the software. noweb is a good alternative.

Experience with WEB

We actually used Spidery WEB for a real project. One novelty was the idea that literate programming might be a useful software-engineering technique, even though we never had any intention of creating a literate program of publishable quality. For example, it was helpful to use the literate program to bring new project members up to speed. A colleague and I described the experience in:

Literate programming on a team project. Software---Practice & Experience, 21(7):677-683, July 1991 (with Carla Marceau). [Abstract]

Readers may find this article valuable for its enumeration of criteria for evaluating literate-programming tools and for its critique of WEB. It is, so far as I know, the only published description of industrial experience with literate programming.

noweb - Simplification and language-independence

I created the noweb literate-programming tool to address the shortcomings of WEB, as identified during the experience with Spidery WEB. My contributions include making the first language-independent literate-programming tool, developing a simpler, more general model of literate program, identifying and stripping away all inessential features of WEB, and creating an extensible implementation by using a pipelined architecture. These contributions are described for the non-specialist in this article:

Literate programming simplified. IEEE Software, 11(5):97-105, September 1994.

The noweb home page provides entrée to the software and its documentation.

Formal Methods

At one time, I worked on formal verification of Ada programs, as described in

Developing formally verified Ada programs. In Proceedings of the 5th International Workshop on Software Specification and Design, pages 257-265, Pittsburgh, Pennsylvania, May 1989. [Abstract]

The original was set for large paper and then photoreduced; I'm afraid I've supplied a rather botched reprint whose only virtue is that it fits on standard US letter paper.

I got interested in file synchronization and developed a new specification technique.

An Algebraic Approach to File Synchronization (with Elod Csirmaz). Foundations of Software Engineering, 175--185, September 2001.

If you're interested in this topic, check out Unison. Those guys were first, and we're swimming in their wake.

The paper on breakpoint correctness is an application of formal methods.

Miscellany

Concurrency

As a student project, I explored how combining polymorphism, garbage collection, and the Standard ML modules system could make it easy to write concurrent programs. The technical report

Concurrent programming in ML. Technical Report TR-262-90, Department of Computer Science, Princeton University, April 1990.

describes what I came up with. The closest thing to a research contribution was an implementation idea that John Reppy used in his CML system, but some people find the paper entertaining anyway.

Error management in compilers

Have you ever been blasted by a screenful of compiler error messages, only to discover that the first message was the only one that meant anything? If so, you'll be glad to read about

Eliminating spurious error messages using exceptions, polymorphism, and higher-order functions. Computer Journal, 42(5):360-372, 1999.

Unparsing

The unparsing problem is to take an internal representation of a program and produce a concrete external representation that, when parsed, reproduces the original representation. The trick is to come up with a readable representation. The paper

Unparsing expressions with prefix and postfix operators. Software---Practice &Experience, 28(12):1327--1356, October 1998.

explains how to solve the unparsing problem, focusing on the elimination of unnecessary parentheses.

Program generation

Most of my work in machine-level software has involved program generators, and the New Jersey Machine-Code Toolkit is a serious program generator. The paper

Pragmatic aspects of reusable program generators. Submitted to Semantics, Application, and Implementation of Program Generation.

describes some of the difficulties of making program generators reusable.

Programming Contests

Kevin Scott and I ran the 1999 ICFP Programming Contest. The paper

The 1999 ICFP programming contest. SIGPLAN Notices 35(3):73--83, March 2000 (with J. Kevin Scott).

describes the problem, the field, and the winning entries.

File synchronization

My experience with the Unison file synchronizer got me interested in the problem of specifying the behavior of such a synchronizer. The paper

An Algebraic Approach to File Synchronization (with Elod Csirmaz)

presents some preliminary results.

Physics

In a former life, I was a physicist. Write me if you want either of these papers:

Formation and breakup rates of RbXe van der Waals molecules in He and N2 gas. Chemical Physics Letters, 102:340, 1983 (with Will Happer et al.).
A search for capacitance fluctuations in tunnel capacitors (M.S. thesis). Technical report, Materials Science Center, Cornell University, August 1986.

Copyright

The documents contained in these pages are included to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Back to Norman Ramsey's home page.