A compiler driver is the part of the compiler that orchestrates passes to get work done. When you run gcc
, you’re interacting with the driver; it runs the C preprocessor, actual compiler, assembler, and linker.
Our compiler works the same way; when you type uft
, you’re interacting with the driver by telling it to translate between an input language and an output language. There are a quadratic number of input-output pairs, but the driver implements them by composing a linear number of functions, each of which has a constant number of cases. I’m rather proud of it.
Three species of functions
The UFT driver is defined in module UFT
in file uft.sml
, which exports just one function, translate
. The driver “understands” each internal language of the UFT, and each language may be supported with these components:
An emitter, which knows how to write a program out to a file
A reader, which knows how to read a program in from a file
A materializer, which knows how to materialize a program from an input in a given language.
The emitters are built from unparsers; in uft.sml
, look at the definitions of functions emitVO
and emitVS
. In those emitters, our unparsing functions are composed with a little machinery that writes a list of lines out to a file. Function emitScheme
is slightly different; it uses a prettyprinter, which is a fancy species of unparser that computes indentation and line breaks.
Reader pipelines
Emitters are embeddings, so they don’t have to deal with the possibility of error. Readers are projections, so they are more complicated. Each reader is defined by function composition. Such a composition looks a lot like a Unix pipeline, only it is typechecked. And unfortunately, the code can be hard to get past the type checker. The only effective way I know to do it is write the function one line at a time, and typecheck it after each line.
As an example, grab the source code while I break down function VS_of_file
. You’ll see composition operators >>>
, >>> !
, and >=>
, which are explained in more detail in my handout on function composition with error types.
The function pipeline begins with
lines
. This function has typeinstream -> string list
, so it cannot fail. Its output has typeline list
, and it can be fed down the pipeline with the composition operator>>>
.(The composition operator
>>>
is like the ordinary function compositiono
, but with arguments in the other order; information flows in the direction of the arrow. It obeys the law(f >>> g) x = g (f x)
.)Function
AsmLex.tokenize
has typestring -> token list error
. But the output of our pipeline so far has typestring list
. I want to tokenize every string in the list, so I applymap
totokenize
. Functionmap AsmLex.tokenize
produces a result of typetoken list error list
.In a translation pipeline, no value of type \(\tau\) error list is ever useful; I always want to convert it to \(\tau\) list error. That’s done by piping it into
Error.list
. The result has typetoken list list error
.The combination of
map f >>> Error.list
is very common, and in your own code I recommend you useError.mapList f
, which is equivalent. UsingError.mapList
, you don’t have to worry about operator precedence going wrong. (A good example can be found in reader functionschemeOfFile
.)Now I have the tokens; I just need to parse them. The input is almost what I need;
AsmParse.parse
is expecting an argument of typetoken list list
. I have that, but wrapped inerror
. This situation is made for Kleisli composition! I complete the pipeline by feeding the intermediate result intoAsmParse.parse
using the Kleisli arrow>=>
. The final result isAssemblyCode.instr list error
, which is what I was trying to read.
A function like VS_of_file
should be defined incrementally, starting with the val
, the type signature, and the first function. At that point, compile, and hope for the compiler to complain that the expression’s type doesn’t match the type signature. Then add another function to the pipeline, and repeat. The idea is that at every step, the pipeline always typechecks. But until the last step, it doesn’t have the right result type.
To help you write your own composition pipelines, I’ve written a short note on function composition with error types.
Materializer pipelines
A reader can do only one thing: read a program in from a file and return it. Like a reader, a materializer returns a program, but it can work in two ways:
Materialize the program from a file (by calling a reader)
Materialize the program by translating it from a higher-level language.
The choice is determined by the identity of the input language, which is given to the materializer as a parameter.
Let’s look at two materializers. Dive back into uft.sml
and look at function VO_of
, which materializes virtual object code.
If the input language is
VO
, we can materializeVO
code by reading it from a file. Except we can’t, because I haven’t bothered to define aVO
reader for the UFT. Virtual object code is the end of the translation pipeline; since there is nothing we can translateVO
to, there is no point in reading it.If the input language is anything else (
inLang
), then we can getVO
code by first materializingVS
code, then assembling it. To do that, we build a little translation pipeline:VS_of inLang >=> Assembler.translate
This case works with any
inLang
that can be translated to assembly code. That’s what makes it a Universal, Forward translator.
Next, look at function VS_of
, which materializes assembly language:
If the input language is
VS
, we can materializeVS
code by reading it from a file.If the input language is anything else (
inLang
), we’d like to get K-normal form frominLang
, then generate assembly code from K-normal form. That will be the main event of module 6.
Eventually we’ll have a materializer for every internal language, and most of them look like a combination of these two: a reading option and a translation option. And K-normal form will have two materializers because it will have two flavors: one with names as VM registers and one with names as strings.
The Universal Forward Translator
The heart of the Universal Forward Translator is function translate
in file uft.sml
. It works by case analysis on the output language: whatever language is asked for, translate
tries to materialize it and emit it. In module 5 I ship cases for VO
, VS
, and HO
. You’ll add the KN
case, and we’ll add the remaining cases as we add languages and translations in modules 8, 9, and 10.