A compiler driver is the part of the compiler that orchestrates passes to get work done. When you run gcc, you’re interacting with the driver; it runs the C preprocessor, actual compiler, assembler, and linker.

Our compiler works the same way; when you type uft, you’re interacting with the driver by telling it to translate between an input language and an output language. There are a quadratic number of input-output pairs, but the driver implements them by composing a linear number of functions, each of which has a constant number of cases. I’m rather proud of it.

Three species of functions

The UFT driver is defined in module UFT in file uft.sml, which exports just one function, translate. The driver “understands” each internal language of the UFT, and each language may be supported with these components:

An emitter, which knows how to write a program out to a file
A reader, which knows how to read a program in from a file
A materializer, which knows how to materialize a program from an input in a given language.

The emitters are built from unparsers; in uft.sml, look at the definitions of functions emitVO and emitVS. In those emitters, our unparsing functions are composed with a little machinery that writes a list of lines out to a file. Function emitScheme is slightly different; it uses a prettyprinter, which is a fancy species of unparser that computes indentation and line breaks.

Reader pipelines

Emitters are embeddings, so they don’t have to deal with the possibility of error. Readers are projections, so they are more complicated. Each reader is defined by function composition. Such a composition looks a lot like a Unix pipeline, only it is typechecked. And unfortunately, the code can be hard to get past the type checker. The only effective way I know to do it is write the function one line at a time, and typecheck it after each line.

As an example, grab the source code while I break down function VS_of_file. You’ll see composition operators >>>, >>> !, and >=>, which are explained in more detail in my handout on function composition with error types.

The function pipeline begins with lines. This function has type instream -> string list, so it cannot fail. Its output has type line list, and it can be fed down the pipeline with the composition operator >>>.

(The composition operator >>> is like the ordinary function composition o, but with arguments in the other order; information flows in the direction of the arrow. It obeys the law (f >>> g) x = g (f x).)
Function AsmLex.tokenize has type string -> token list error. But the output of our pipeline so far has type string list. I want to tokenize every string in the list, so I apply map to tokenize. Function map AsmLex.tokenize produces a result of type token list error list.
In a translation pipeline, no value of type \(\tau\) error list is ever useful; I always want to convert it to \(\tau\) list error. That’s done by piping it into Error.list. The result has type token list list error.

The combination of map f >>> Error.list is very common, and in your own code I recommend you use Error.mapList f, which is equivalent. Using Error.mapList, you don’t have to worry about operator precedence going wrong. (A good example can be found in reader function schemeOfFile.)
Now I have the tokens; I just need to parse them. The input is almost what I need; AsmParse.parse is expecting an argument of type token list list. I have that, but wrapped in error. This situation is made for Kleisli composition! I complete the pipeline by feeding the intermediate result into AsmParse.parse using the Kleisli arrow >=>. The final result is AssemblyCode.instr list error, which is what I was trying to read.

A function like VS_of_file should be defined incrementally, starting with the val, the type signature, and the first function. At that point, compile, and hope for the compiler to complain that the expression’s type doesn’t match the type signature. Then add another function to the pipeline, and repeat. The idea is that at every step, the pipeline always typechecks. But until the last step, it doesn’t have the right result type.

To help you write your own composition pipelines, I’ve written a short note on function composition with error types.

Materializer pipelines

A reader can do only one thing: read a program in from a file and return it. Like a reader, a materializer returns a program, but it can work in two ways:

Materialize the program from a file (by calling a reader)
Materialize the program by translating it from a higher-level language.

The choice is determined by the identity of the input language, which is given to the materializer as a parameter.

Let’s look at two materializers. Dive back into uft.sml and look at function VO_of, which materializes virtual object code.

If the input language is VO, we can materialize VO code by reading it from a file. Except we can’t, because I haven’t bothered to define a VO reader for the UFT. Virtual object code is the end of the translation pipeline; since there is nothing we can translate VO to, there is no point in reading it.
If the input language is anything else (inLang), then we can get VO code by first materializing VS code, then assembling it. To do that, we build a little translation pipeline:
```
VS_of inLang >=> Assembler.translate
```
This case works with any inLang that can be translated to assembly code. That’s what makes it a Universal, Forward translator.

Next, look at function VS_of, which materializes assembly language:

If the input language is VS, we can materialize VS code by reading it from a file.
If the input language is anything else (inLang), we’d like to get K-normal form from inLang, then generate assembly code from K-normal form. That will be the main event of module 6.

Eventually we’ll have a materializer for every internal language, and most of them look like a combination of these two: a reading option and a translation option. And K-normal form will have two materializers because it will have two flavors: one with names as VM registers and one with names as strings.

The Universal Forward Translator

The heart of the Universal Forward Translator is function translate in file uft.sml. It works by case analysis on the output language: whatever language is asked for, translate tries to materialize it and emit it. In module 5 I ship cases for VO, VS, and HO. You’ll add the KN case, and we’ll add the remaining cases as we add languages and translations in modules 8, 9, and 10.