Norman Ramsey - Research Activities

This page sketches my research activities and shows how to get the papers that have the details. Information is organized by topic as far as possible, although some work could reasonably appear under more than one topic. A chronological list of papers is also available.

Topics covered on this page include:

  • Retargetability
  • Machine Code
  • C--, a Portable Assembly Language
  • Binary Translation
  • Debugging
  • Language Design
  • Interpreters
  • Literate Programming
  • Spider - generating language-dependent tools
  • Noweb - a simple, language-independent tool
  • Formal Methods
  • Miscellany
  • Concurrency
  • Error management in compilers
  • Unparsing
  • Program generation
  • Programming Contests
  • File synchronization
  • Physics

  • Retargetability

    I am intrigued by the problems of retargeting machine-dependent software to different hardware platforms. This topic is of especial interest to those who create such software, since it is hard for improvements in debuggers, linkers, etc. to have an impact unless the software works on multiple platforms. My contributions include a retargetable debugger and a toolkit for recognizing and generating machine instructions. A research summary is available.

    Machine code

    Representations of machine instructions

    ldb's breakpoint implementations required a rudimentary facility for recognizing machine instructions. Attempts to generalize that facility led to the New Jersey Machine-Code Toolkit, which uses a new formalism to describe binary encodings. In one sentence, it gives application writers the ability to manipulate true binary representations of machine code while programming at an assembly-language level of abstraction. There are two papers describing the main ideas: Those who actually want to use the toolkit will probably find the following technical reports useful: The toolkit project has also led to a machine-independent method of relocating instructions, described in
    Relocating machine instructions by currying. ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, 226--236, May 1996. [Abstract]
    The toolkit's equation solver is described in
    A simple solver for linear equations containing nonlinear operators. Software---Practice & Experience, 26(4):467--487, April 1996. An earlier version appeared as Technical Report 95-068, Purdue University, Dept of Computer Sciences, November 1995.
    The method we use to test toolkit specifications is described in
    Automatic Checking of Instruction Specifications. 1997 International Conference on Software Engineering, pp 326-336, May 1997 (with Mary F. Fernández).
    My research summary for retargeting describes the toolkit. The toolkit home page provides entrée to the software and its documentation.

    Semantics of machine instructions

    I've developed Lambda-RTL, a new metalanguage for specifying the semantics of machine instructions. You can read about goals and some preliminary results in
    Machine Descriptions to Build Tools for Embedded Systems (with Jack W. Davidson). In ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), volume 1474 of LNCS, pages 172--188. Springer Verlag, June 1998.
    The semantic descriptions are based on register transfers. Unlike other tools, Lambda-RTL produces fully explicit register transfers, in which the sizes of all operands and results are explicit, as well as the byte order used for all fetches and stores. The Lambda-RTL translator uses type inference with subtyping to add this information to specifications written in a simple metalanguage. Nascent applications include More information is available on the Zephyr Lambda-RTL pages.

    C--, a Portable Assembly Language

    Compiler-writers often use C as a portable assembly language, so they can avoid the pain of writing multiple optimizing back ends. For this purpose, C's outstanding virtue is its ubiquity. But C was designed as a high-level programming language, and as a compiler-target language it has many deficiencies. In particular, it lacks support for tail calls, exception handling, garbage collection, and lightweight concurrency. I am therefore working with Simon Peyton Jones on the design of C--, a language that is designed as a compiler-target language. My particular interest is in compile-time and run-time support for garbage collection, exceptions, and debugging.

    Simon and I have written three papers about C--, but we're still not satisfied. Please send feedback!

    Christian Lindig and I are buildng the Quick C-- implementation.

    Binary Translation

    I'm working with Cristina Cifuentes on a retargetable platform for binary translation. Cristina has put together a nice overview page. This work is supported by the Australian Research Council. We have written two papers:

    Debugging

    The goal of the Debugging Everywhere project is to make debugging a cheap, ubiquitous service. The idea behind the project is to use Active Debugging Information, i.e., debugging symbols containing executable content. Rather than have the compiler give the debugger a bunch of source-level information about the program, forcing the debugger to figure out what it means on a particular machine, we have the compiler provide the debugger with code it can execute directly to do things like evaluate expressions and print values. This approach will have two benefits: The project is based on earlier work on ldb, a retargetable debugger. I've written two papers and a thesis: My research summary for retargeting describes ldb.

    Interpreters

    There's nothing like a quick interpreter to try out a new language idea. I am especially interested in embedding interpreters into functional languages. I have written Lua-ML, which is an implementation of the Lua scripting language in Objective Caml. It is very useful for scripting and configuring Caml programs; we use it to configure our compiler for C--.

    I have written two papers on Lua-ML:


    Language Design

    Most of my work is in implementation, but I am also interested in language design.

    Literate Programming

    Literate programming is the act of writing programs primarily as documents to be read by human beings, and only secondarily as instructions to be executed by computer. The term was coined by Don Knuth to describe WEB, the tool he created to write TeX. My contributions to literate programming have included analyzing tools, building tools to support multiple programming languages, and simplifying tools by recognizing and eliminating the inessential. I have also published some literate programs:

    Spider - Generating WEBs for different languages

    In 1986, you couldn't write a literate program in a new programming language without first writing a new tool. Incompatible, language-dependent versions of WEB were starting to proliferate. I created Spider, a WEB generator, which can create a WEB system for any language whose lexical and syntactic structure are roughly Algol-like. I described Spider in a ``Literate Programming'' column for CACM, and the system is documented in two Princeton technical reports: Spidery WEB, like the other members of the WEB family, is too complex (see paper below), so I discourage readers from digging up the manuals or the software. noweb is a good alternative.

    Experience with WEB

    We actually used Spidery WEB for a real project. One novelty was the idea that literate programming might be a useful software-engineering technique, even though we never had any intention of creating a literate program of publishable quality. For example, it was helpful to use the literate program to bring new project members up to speed. A colleague and I described the experience in: Readers may find this article valuable for its enumeration of criteria for evaluating literate-programming tools and for its critique of WEB. It is, so far as I know, the only published description of industrial experience with literate programming.

    noweb - Simplification and language-independence

    I created the noweb literate-programming tool to address the shortcomings of WEB, as identified during the experience with Spidery WEB. My contributions include making the first language-independent literate-programming tool, developing a simpler, more general model of literate program, identifying and stripping away all inessential features of WEB, and creating an extensible implementation by using a pipelined architecture. These contributions are described for the non-specialist in this article: The noweb home page provides entrée to the software and its documentation.

    Formal Methods

    At one time, I worked on formal verification of Ada programs, as described in The original was set for large paper and then photoreduced; I'm afraid I've supplied a rather botched reprint whose only virtue is that it fits on standard US letter paper.

    I got interested in file synchronization and developed a new specification technique.

    If you're interested in this topic, check out Unison. Those guys were first, and we're swimming in their wake.

    The paper on breakpoint correctness is an application of formal methods.


    Miscellany

    Concurrency

    As a student project, I explored how combining polymorphism, garbage collection, and the Standard ML modules system could make it easy to write concurrent programs. The technical report describes what I came up with. The closest thing to a research contribution was an implementation idea that John Reppy used in his CML system, but some people find the paper entertaining anyway.

    Error management in compilers

    Have you ever been blasted by a screenful of compiler error messages, only to discover that the first message was the only one that meant anything? If so, you'll be glad to read about

    Unparsing

    The unparsing problem is to take an internal representation of a program and produce a concrete external representation that, when parsed, reproduces the original representation. The trick is to come up with a readable representation. The paper explains how to solve the unparsing problem, focusing on the elimination of unnecessary parentheses.

    Program generation

    Most of my work in machine-level software has involved program generators, and the New Jersey Machine-Code Toolkit is a serious program generator. The paper describes some of the difficulties of making program generators reusable.

    Programming Contests

    Kevin Scott and I ran the 1999 ICFP Programming Contest. The paper describes the problem, the field, and the winning entries.

    File synchronization

    My experience with the Unison file synchronizer got me interested in the problem of specifying the behavior of such a synchronizer. The paper presents some preliminary results.

    Physics

    In a former life, I was a physicist. Write me if you want either of these papers:

    Copyright

    The documents contained in these pages are included to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
    Back to Norman Ramsey's home page.