Secrets of the UFT driver

A compiler driver is the part of the compiler that orchestrates passes to get work done. When you run gcc, you’re interacting with the driver; it runs the C preprocessor, actual compiler, assembler, and linker.

Our compiler works the same way; when you type uft, you’re interacting with the driver by telling it to translate between an input language and an output language. There are a quadratic number of input-output pairs, but the driver implements them by composing a linear number of functions, each of which has a constant number of cases. I’m rather proud of it.

Three species of functions

The UFT driver is defined in module UFT in file uft.sml, which exports just one function, translate. The driver “understands” each internal language of the UFT, and each language may be supported with these components:

The emitters are built from unparsers; in uft.sml, look at the definitions of functions emitVO and emitVS. In those emitters, our unparsing functions are composed with a little machinery that writes a list of lines out to a file. Function emitScheme is slightly different; it uses a prettyprinter, which is a fancy species of unparser that computes indentation and line breaks.

Reader pipelines

Emitters are embeddings, so they don’t have to deal with the possibility of error. Readers are projections, so they are more complicated. Each reader is defined by function composition. Such a composition looks a lot like a Unix pipeline, only it is typechecked. And unfortunately, the code can be hard to get past the type checker. The only effective way I know to do it is write the function one line at a time, and typecheck it after each line.

As an example, grab the source code while I break down function VS_of_file. You’ll see composition operators >>>, >>> !, and >=>, which are explained in more detail in my handout on function composition with error types.

A function like VS_of_file should be defined incrementally, starting with the val, the type signature, and the first function. At that point, compile, and hope for the compiler to complain that the expression’s type doesn’t match the type signature. Then add another function to the pipeline, and repeat. The idea is that at every step, the pipeline always typechecks. But until the last step, it doesn’t have the right result type.

To help you write your own composition pipelines, I’ve written a short note on function composition with error types.

Materializer pipelines

A reader can do only one thing: read a program in from a file and return it. Like a reader, a materializer returns a program, but it can work in two ways:

The choice is determined by the identity of the input language, which is given to the materializer as a parameter.

Let’s look at two materializers. Dive back into uft.sml and look at function VO_of, which materializes virtual object code.

Next, look at function VS_of, which materializes assembly language:

Eventually we’ll have a materializer for every internal language, and most of them look like a combination of these two: a reading option and a translation option. And K-normal form will have two materializers because it will have two flavors: one with names as VM registers and one with names as strings.

The Universal Forward Translator

The heart of the Universal Forward Translator is function translate in file uft.sml. It works by case analysis on the output language: whatever language is asked for, translate tries to materialize it and emit it. In module 5 I ship cases for VO, VS, and HO. You’ll add the KN case, and we’ll add the remaining cases as we add languages and translations in modules 8, 9, and 10.