Send email to
Send physical documents to my postal address.
If you come yourself you'll want directions.
Telephone +1 617 627 4923.
Fax +1 617 627 2227 (but if you can, scan and email instead)
My vita and public key (now with photo!) are online.
I type 75 words per minute. We are typists first, so test yourself.
Who I am and what I do
I do research in programming languages and teach a mix of classes at Tufts University. I have had my fingers in a lot of pies!
Of all the problems in making programming-language infrastructure reusable, the most challenging is this: if we are given a new machine and are told what its instructions do, how do we generate code for it? Working with my student and postdoc João Dias, I have developed methods of automatically generating an instruction selector—the heart of a code generator—from declarative machine descriptions (POPL 2010). We've also published the abstraction that the instruction selector is built on (POPL 2011).
Ideas about infrastructure are most convincing when they are deployed. In 2014, my group’s ideas about code generation were deployed in a new code generator for the Glasgow Haskell Compiler (GHC). Deployment required several years’ effort from João Dias, Simon Marlow of Microsoft Research, Simon Peyton Jones of Microsoft Research, and me. The new code generator’s most interesting component is a reusable, higher-order optimization library, Hoopl (Haskell 2010), which uses generalized algebraic data types to guarantee at compiler-compile time that no matter what Haskell program it is given, GHC never builds an ill-formed control-flow graph.
Languages and learning
People who want to learn to use programming languages effectively can try starting with an industrial language, but the sheer size of a typical industrial language and library make it hard to discover and apply the ideas that make the language worth learning. People can also consult books, but existing books primarily talk about programming languages, or they talk about how programming languages are implemented, or they steer people to industrial languages. To make it possible for people not just to learn about great ideas in programming languages, but to build software that applies these ideas effectively, I have designed and implemented a collection of tiny programming languages for language learners. Using my languages, learners can build programs that apply some of the greatest ideas in programming languages: functions, types, and objects. The languages form the skeleton of a new book, Programming Languages: Build, Prove, and Compare, which will be published by Cambridge University Press.
While implementing the languages I created for the book, I found an interesting problem: there is a big engineering gap between simple, definitional interpreters, which we write to illustrate precisely what a language is supposed to mean, and industrial-strength interpreters and compilers, which we write to run programs efficiently. To narrow this gap, I’ve investigated ways of engineering definitional interpreters to be more efficient, without making them much harder to write (PPDP 2013).
Because programming languages are the medium in which software is written, they are also the medium in which beginners learn to build software. The programming language and technique taught in introductory courses are widely believed to affect learning. I have recently piloted a first course based on How to Design Programs by Felleisen et al., which uses functional programming languages and techniques. My analysis, refinements, and recommendations have just been published at the International Conference on Functional Programming (ICFP 2014).
Language applications, design, and semantics
Languages and techniques are refined, evaluated, and improved by using them on real problems. I’ve applied functional-programming techniques to computational biology; working with graduate students in computational biology, I showed ways in which functional programming works to help solve their problems (ICFP 2012). We also identified obstacles that prevented functional programming from working as well as it could have.
Another application area, machine learning, builds on Bayesian reasoning about probabilities. But in current practice, a programmer’s Bayesian ideas are usually hand-translated into a general-purpose programming language like Matlab or C++. Such programmers might be much more productive if given a probabilistic programming language, in which Bayesian reasoning can be expressed directly. With support from DARPA, I'm working with colleagues from BAE Systems and Northeastern University on the design and semantics for probabilistic programming languages.
Over the past seven years, I have revamped a significant portion of Tufts's required undergraduate curriculum in computer science. Our students are required to take four courses that have programming assignments. As a result of my work, these courses now ask much more of our students than they did formerly, and they also offer more. In particular, we now offer many more challenging, rewarding problems of a sort that students would not think to tackle on their own.
In addition to the pilot introductory course mentioned above, I have completely redesigned the third and fourth courses in our programming sequence.
Our third course, COMP 40 (Machine Structures and Assembly-Language Programming), is most similar to a course in machine organization or systems programming. I redesigned the course to focus on two sets of skills: applied data abstraction and machine-level programming. The redesigned course, which has now been taught by three other instructors, is viewed by some as the most valuable course in our department. It is highly praised on surveys of graduating seniors, who are given the opportunity to identify just one course that exemplifies "what a truly excellent college course should be."
I replaced an older "paradigms" course in programming languages with a new required course in programming languages, COMP 105, which demands that students learn to use key programming-language ideas in actual programming, and which also demands some mathematical content (e.g., operational semantics, equational proofs). The course, which uses my draft book Programming Languages: Build, Prove, and Compare, effectively serves as our fourth programming course. COMP 105 is also highly regarded by students and mentioned on senior surveys, although not quite as much as COMP 40.
I also have some opinions about what else we should be teaching.
My teaching and curricular work were recognized with the 2015 Lerman-Neubauer Prize for Outstanding Teaching and Advising. This prize is awarded annually to a member of the Tufts faculty who has had a profound intellectual impact on his or her students, both inside and outside the classroom.
This page show my most significant and most recent papers. Links are to abstracts so you can check out the topic without downloading a monster. For a complete view, including older work, see my publications list.
Five most significant papers
- Relocating Machine
Instructions by Currying.
Proceedings of the ACM
SIGPLAN '96 Conference on Programming Language Design and Implementation,
in SIGPLAN Notices, 31(5):226–236, May 1996.
This paper connects two worlds: the ``low cult'' of systems programming and the pure, mathematical world of the lambda-calculus. The key insight is that relocation, which is a low-level operation performed on binary code, is an instance of currying, which is the expression of a multiple-argument function in the lambda-calculus.
- Stochastic Lambda Calculus and
Monads of Probability Distributions (with Avi Pfeffer).
Proceedings of the 29th ACM Symposium on the Principles of Programming
Languages, in SIGPLAN Notices, 37(1):154–165, January
This paper explores the design of probabilistic languages in a foundational, principled way. Its special contribution is to analyze important implementation techniques in a way that is completely formal and is rigorously connected to the theory of probability.
- A Transformational Approach to
Binary Translation of Delayed Branches (with Cristina Cifuentes).
ACM Transactions on Programming Languages and Systems,
25(2):210–224, March 2003.
The paper solves a small but difficult problem in analysis of binary codes. This problem is repeatedly a stumbling block for industry groups that work with binary codes, and I believe our solution is definitive.
- An Expressive Language of
Signatures (with Kathleen Fisher and Paul Govereau).
In Proceedings of the Tenth
ACM SIGPLAN International Conference on Functional Programming
(ICFP'05), pages 27–40, September 2005.
Selected as one of the best papers of ICFP 2005.
In this paper, we identified an important class of programming problems the solutions to which cannot be expressed in current languages, and we showed that these problems can be solved by new mechanisms that cohere with an existing language.
- Automatically Generating
Instruction Selectors Using Declarative Machine Descriptions (with João Dias).
In Proceedings of the 37th ACM Symposium on the Principles of Programming
Languages, pages 403–416, January 2010.
The most beautiful results from João Dias's doctoral dissertation: (a) if all you know is the semantics of the intermediate code and the target instruction set, generating a code generator is undecidable; and (b) by using a clever new heuristic search based on algebraic laws, João can generate code generators for real machines quickly. The core of the algorithm combines Hoare logic and unification to find sequences of machine instructions that implement intermediate code. Will appeal especially to those who like inference rules with their compilers.
Five other significant papers
- Specifying Representations of
Machine Instructions (with Mary F.
ACM Transactions on Programming Languages and Systems,
19(3):492–524, May 1997.
This is the most technical and the definitive description of my early work on declarative machine descriptions.
- A Single Intermediate Language
That Supports Multiple Implementations of Exceptions (with Simon L. Peyton
Proceedings of the ACM SIGPLAN '00 Conference on Programming Language
Design and Implementation, in SIGPLAN Notices,
35(5):285–298, May 2000.
This paper is the most technical and rigorous of the C-- papers. It exemplifies what I am trying to achieve in C--: clean, low-level mechanisms that compiler writers can use to implement different high-level–language features and to control cost tradeoffs.
- An Algebraic Approach to File
Synchronization (with Elöd Csirmaz).
In Proceedings of the 8th European Software Engineering Conference (ESEC)
and 9th ACM SIGSOFT Symposium on the Foundations of Software Engineering
(FSE-9), pages 175–185, September 2001.
This paper discusses how to maintain consistency among multiple replicas of files that may be modified concurrently. We proposed reasoning about this problem by examining the algebraic structure of a sequence of modifications.
- A Generalized Algorithm for
Graph-Coloring Register Allocation (with Michael D. Smith and
ACM SIGPLAN '04
Conference on Programming Language Design and Implementation, in
SIGPLAN Notices, 39(6):277–288, June 2004.
A new technique for dealing with irregularities in target machines, which arise because not all machine registers can be used interchangeably. We hope this will be the definitive paper on handling irregular register files in a graph-coloring register allocator.
- Staged Allocation: A Compositional
Technique for Specifying and Implementing Procedure Calling Conventions
(with Reuben Olinsky and Christian Lindig).
In Proceedings of the
33rd ACM Symposium on the Principles of Programming Languages,
pages 409–421, January 2006.
A specification language for parameter passing, unique in having a formal semantics. Together with an earlier paper on stack-frame layout, a complete approach to calling conventions.
Other recent papers
These are some recent papers not listed above. They are in chronological order, so the most recent work is at the bottom.
- Hoopl: A Modular, Reusable
Library for Dataflow Analysis and Transformation (with João Dias and Simon L.
In Proceedings of the 3rd ACM SIGPLAN Symposium on Haskell
(Haskell 2010), September 2010.
We implement dataflow analysis and transformation in modular fashion using advanced features of Haskell. Unlike our 2009 technical report, which emphasizes the advantages of using Hoopl, this paper focuses on the implementation, which has been carefully modularized to separate each of the tricky bits from all of the others. Of greatest interest to hard-core functional programmers and compiler writers.
- Resourceable, Retargetable, Modular
Instruction Selection Using a Machine-Independent, Type-Based Tiling of
Low-Level Intermediate Code (with João Dias).
In Proceedings of the 38th ACM Symposium on the Principles of Programming
Languages, pages 575–586, January 2011.
In most compilers, building an instruction selector requires a mapping from intermediate form to target-machine instructions. The mapping must be defined once per target machine. Here we describe a single mapping that works for all register machines. By using this mapping, we make the machine-dependent components simple enough that they can be generated automatically.
- Teaching Technical Writing Using the Engineering Method. March 2011. A handbook for teaching writing to students in science or engineering. Accompanied by a student's edition.
- Embedding an Interpreted Language
Using Higher-Order Functions and Types.
Journal of Functional
Programming, 21(6):585–615, November 2011.
(A previous version appeared in ACM SIGPLAN 2003 Workshop on
Interpreters, Virtual Machines and Emulators.).
Homage to Olivier Danvy: How to embed an application-specific function into an interpreter simply by describing its type. The paper, while not technically deep, shows the capabilities of functional languages to elegant advantage, and it is representative of my work on interpreters.
- Experience Report: Haskell in
Computational Biology (with Noah M. Daniels and Andrew Gallant).
In Proceedings of the Seventeenth ACM SIGPLAN International Conference on
Functional Programming (ICFP'12), pages 227–234, September
Selected as one of the best papers of ICFP 2012.
Encouragement for computational biologists to try Haskell. If you are a functional programmer, you will see an interesting variation on lazy search. You might also get some insight into why beginners may find it difficult to use QuickCheck.
- Engineering Definitional
Interpreters (with Jan Midtgaard and Bradford Larsen).
In Proceedings of the 15th International Symposium on Principles and
Practice of Declarative Programming (PPDP'13), pages 121–132,
Definitional interpreters can be simple but often don't perform very well. Mature bytecode interpreters can perform very well but aren't simple or easy to build. This paper recounts our extensive search for the "best simple" variation on a definitional interpreter written in a functional language.
- MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov
Random Fields and Stochastic Search (with Noah M. Daniels, Andrew
Gallant, Norman Ramsey, and
Lenore J. Cowen).
IEEE/ACM Transactions on Computational Biology and Bioinformatics,
A preliminary version of this paper appeared in the Fourth ACM
International Conference on Bioinformatics, Computational Biology, and
Biomedical Informatics (ACM BCB 2013), where it was recognized as a Best
Results of our work on stochastic search for protein homologies. The search tool was written in the functional language Haskell.
- On Teaching How to Design
Programs: Observations from a Newcomer.
In Proceedings of the Nineteenth ACM SIGPLAN International Conference on
Functional Programming (ICFP'14), pages 153–166, September
In Fall 2013 I taught introductory programming using the interesting textbook How to Design Programs. If you might teach from this book, this paper contains lots of useful information about how to prepare and teach such a course.
- Symbolic Bayesian Inference
by Lazy Partial Evaluation (with Chung-Chieh Shan).
In Proceedings of the 44th ACM Symposium on the Principles of Programming
Languages, pages 130–144, January 2017.
To draw a probabilistic inference from an observation of measure zero, disintegrate a continuous space into an infinite family of posterior distributions. Then use the observation to select one member of the family. Chung-Chieh Shan's work is brilliant and I'm proud to be associated with it. (If you liked my work on the probability monad, this paper is a good next step.)
- Beyond Relooper: Recursive Translation of Unstructured Control Flow to
Structured Control Flow (Functional Pearl).
Proceedings of the ACM on Programming Languages, 6(ICFP), August 2022.
This functional pearl revives a 1973 program translation that can convert any reducible control-flow graph to structured control flow, without introducing superfluous code, variables, or tests. When implemented functionally, the algorithm is simple, and its correctness is easy to argue. The key new idea is that the translation can be implemented by a function that is recursive over the dominator tree.
For a complete view, including older work, see my publications list.
The ACM requires this disclaimer:
The documents contained in these pages are included to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
In Spring 2023 I am teaching CS 106 (Simple Virtual Machines and Language Translation).
At Tufts, I have taught COMP 250RTS (Run-Time Systems); COMP 150PP (Probabilistic Programming Languages); COMP 150TW (The Engineering Method of Technical Writing); COMP 50, a pilot version of the first course that I co-developed with Ben Hescott; COMP 40 (Machine Structures and Assembly-Language Programming); COMP 105 (Programming Languages); COMP 150GIT (Functional Programming and Source-Code Control); COMP 150DAO (Dataflow Analysis and Optimization); and COMP 150FP (Advanced Functional Programming). I have also taught many courses at other universities that need not be named.
Resources for Students
Office hours for Fall 2019 are by appointment. To make an appointment, please send email with three times that work for you.
I have gathered material of interest to research
students, including resources for
how to give a talk.
Undergraduate research students might also be interested, especially in my thoughts about how to get admitted to a PhD program. And I tell you what you need to give me (and when) if you want a letter of recommendation.
If you want to teach at a community college, here are some tips from a community-college dean.
My professional home is in the Department of Computer Science at Tufts University. My GPG public key's fingerprint is 72F7 B434 AB7C D7D0 D5A9 B537 BD01 D704 7276 3614. My ORCID ID is 0000-0002-5435-1135. Some people think I'm a power user; others think I never sleep. They may be right; my ~/bin directory contains over a thousand scripts, almost all of which I wrote myself. But a lot of people know me only as the creator of noweb.
I have twice served ACM as a member of the SIGPLAN Executive Committee. I served as program chair for ICFP '07.
I signed the email charter; if you won't, I won't. After years of relying on the kindness of strangers, I finally started carrying a cell phone. I no longer maintain a hot list; this is more of a random list. I like Free DNS. An interest in personal productivity and pointers from Benjamin Pierce and Phil Wadler got me to Inbox Zero on Wed 21 Feb 2007 at 6:00 PM. After serious lossage caused by various alarums and excursions, I recovered Zero at 6:30 PM on Mon 31 Dec 2007. It was a pleasure to start the New Year with an empty inbox! After starting at Tufts, I got a little behind; at the end of my first year, my email debt was over 600 messages. At the end of my third year, at 9:12 PM on Monday 23 May 2011, I recovered Zero once again, but I cheated—I put 600 messages from 2009 and 2010 into an email demilitarized zone. The next time I got to Zero was at 5:03 PM on Wednesday 29 August 2012, again at 2:54 PM on 30 May 2014, again at 6:34 PM on 12 November 2014, and again at 12:00 PM on 7 June 2017. For a time I almost thought I detected a pattern there.
a Double Tiger,
a Bellcore alumnus,
Luxuriant Flowing Hair Club for Scientists,
and I have an Erdös
Number of 3.
I've been seen wearing orange and
Despite these distinguished credentials,
I'm not ashamed to subscribe to a magazine with a centerfold.
A secret vice is that I used to answer programming questions for
fun; at one time,
I was the 40th most reputable contributor (out of over 100,000)
on Stack Overflow
(motto: "This thread has been closed as Off Topic").
Along the way I earned silver Specialist badges in C, Haskell,
and a couple of other topics.
I’ve seen one total eclipse of the sun. Although I read extensively about viewing eclipses, I still was not prepared to take it all in. You can read my advice to my future self.
Although it surprises people, for over forty years I have
fan—though I wouldn't mind realignment.
When it's not football season, I've been known to
play Guild Wars.
I also have a rare autographed copy of
I try to avoid P. J. Brown's deadly sins.
I'm married to a licensed
I've appeared onstage
(and in various
dreams are of sleeping.