(This module is also available in PDF, in two-column and one-column formats.)
Introduction
This week you’ll start your journey from a machine-level language to a high-level language. You’ll implement a language called K-normal form, which is nearly a subset of Scheme. Then next week, you’ll generate assembly code from K-normal form, and you’ll be able to run your first high-level-language code.
What am I doing?
Understand the three different roles names can play in Scheme code, and write a compiler pass that uses special-purpose syntax to identify what role each occurrence of each name is playing.
Define an K-normal form, which is a low-level subset of Scheme.
to represent expressions inDefine an embedding/projection pair that relates K-normal form to vScheme syntax.
Extend the UFT driver with support for K-normal form.
Why am I doing it?
Translating names is one of the deepest parts of any compiler, because it can be done only within some kind of context or environment that says what each name stands for. Moving the contextual information into syntax makes all the subsequent compiler passes much, much easier.
K-normal form is the next step in our long-term plan of lifting machine-level code up to something nice. It is the lowest-level intermediate code that resembles a high-level language more than an assembly language—and “making machine code look like a high-level language” is a useful tactic in almost any compiler.
Embedding and projection make it possible to debug. To get K-normal form out of your translator, just embed it into vScheme and use my prettyprinter for vScheme. To get K-normal form into your translator, write vScheme and project it into K-normal form.
How?
Before lab you’ll study the roles of names, and you’ll learn two related representations of vScheme:
VScheme
, in which a name’s role has to be learned by looking up the name in the environment; andUnambiguousVScheme
, in which the name’s role is made explicit in the abstract syntax.In lab you’ll complete the disambiguation pass that looks up each name in the environment and selects the correct syntax for each occurrence. I’ve done the boring parts; you’ll do the interesting parts.
After lab you’ll study the grammar of K-normal form and its relationship to Scheme. Then you’ll define its representation, an embedding into
VScheme
, and a projection fromUnambiguousVScheme
.Finally you’ll add support to the Universal Forward Translator so it can read and write K-normal form. At the end of the week, you’ll deliver a working Universal Forward Translator that includes a
kn-kn
pass. This pass reads vScheme, disambiguates it, projects it to K-normal form, embeds the result back into vScheme, and prettyprints it.You’ll also deliver test cases: one that exercises every syntactic form in KNF, and one each for every possible way that projection can fail.
Detailed instructions are below.
The big picture
K-normal form is about two things: language paradigm and naming.
Target paradigm (assembly language): imperative code, commands, assignment, resembles a subset of C
Source paradigm (K-normal form): applicative code, function calls,
let
bindings, is a subset of Scheme
Once we have K-normal form, we’ll add features until eventually we’ll be compiling full Scheme.
Ideas about naming are deeper. The key idea is this: in syntax, all names look alike, but at run time, different names behave differently. And those differences can be discovered by inspecting a compile-time environment:
At run time, a global name refers to a location in the VM state’s
globals
table.At run time, a local name (one that is bound by
let
andlambda
) refers to a VM register.The name of an ordinary function (like
map
orfoldr
) codes for an ordinary function call.The name of a primitive function (like
cons
or+
) codes for a VM instruction.
In Scheme source code, all these names are written using the same syntax. In K-normal form, each species of name and each species of call is written using explicitly different syntax—the syntax instantly identifies which variables are global vs local and which calls are ordinary vs VM instructions. These syntactic distinctions are independent of K-normal form; are also made in a high-level language called “unambiguous vScheme”. Transforming original vScheme into unambiguous vScheme, a translation I call disambiguation, is the subject of this week’s lab.
Aside from the explicit distinctions of names and calls, K-normal form adds two key invariants to unambiguous Scheme:
The value of every intermediate expression has a name.
Function definitions do not nest. (That is, no function may refer to parameters or variables of an enclosing function.)
These two invariants are common to many low-level intermediate languages for many different kinds of compilers.
Here’s a snapshot from the video, cleaned up and with this week’s parts highlighted:

The module step by step
Before lab: Understand the big picture, especially names.
Download updates. Update your git repository in two steps:
Be sure all your own code is committed. Confirm using
git status
.Update your working tree by running
git pull
(or you might possibly needgit pull origin main
). You should not have any merge conflicts.
Verify your build. Verify that you can build the
uft
binary withmake
, and that it recognizes the translationho-ho
. Any expression with a name or a function application should exit with an uncaught exception, but expressions without names and applications should go through:> echo '(+ 2 2)' | uft ho-ho Uncaught exception: LeftAsExercise: disambiguate APPLY > echo '(if #t 1 0)' | uft ho-ho (if #t 1 0)
Understand the big picture. The overview video about the high-level languages of the UFT, including all the transformations we will implement presents the big picture of the Universal Forward Translator. Watch it before lab.
Understand the three meanings of names. Read the first part of the handout on Unambiguous vScheme. Follow up by reading the
referent
type andreferent
function defined at the beginning of filedisambiguate.sml
.Study vScheme. Look over the handout on vScheme, and study the definition in module
VScheme
in filevscheme.sml
. Be sure you understand the formsVAR
,SET
, andAPPLY
; these are the forms you will be disambiguating in lab.Study Unambiguous vScheme. Complete the handout on Unambiguous vScheme, and study the definition in module
UnambiguousVScheme
in filevscheme.sml
. Be sure you understand the formsLOCAL
,GLOBAL
,SETLOCAL
,SETGLOBAL
,FUNCALL
, andPRIMCALL
; these are the forms that replace the ambiguous forms in step 6.Understand eta expansion. In file
disambiguate.sml
, be sure you understand what functionetaExpand
is doing. Eta expansion converts a primitive into an ordinary function. If you have not encountered the eta expansion or reduction in CS 105, you can consult any of the following resources:Short explanation from MLton, an ML compiler
Short documentation, from Dotty, a Scala compiler
Long post on Medium about Scala
YouTube video also about Scala
To paraphrase one of the longer posts,
Eta-expansion is what compiler does “behind the scenes” when it notices that you need a value, but are provided a primitive.
Review. During lab, the ambiguous forms will be replaced:
- The original
VAR
form must be disambiguated into eitherLOCAL
orGLOBAL
. - The original
SET
form must be disambiguated into eitherSETLOCAL
orSETGLOBAL
. - The original
APPLY
form must be disambiguated into eitherFUNCALL
orPRIMCALL
.
Before lab begins, be sure you can answer these questions:
For each of the three possible referents of a name (local, primitive, and other global), what form replaces the
VAR
form?For each of the three possible referents of a name (local, primitive, and other global), what form replaces the
SET
form?If a name appears in function position in
APPLY
, for each of the three possible referents of the (local, primitive, and other global), what form replaces theAPPLY
form?If a non-name appears in function position in
APPLY
, what form replaces theAPPLY
form?
- The original
Consider reading ahead. If you want the opportunity to ask informed questions during lab, you could consider reading the handout on K-normal form.
Lab: Disambiguate names
Disambiguate names. In file
disambiguate.sml
, complete functionexp
: Using your answers to the questions in step 8, replace all uses ofLeftAsExercise
with new code, then remove the definition of exceptionLeftAsExercise
.Function
exp
is nested inside functionexp'
, which tracks context by keeping listlocals
. The list contains all formal parameters and let-bound variables. That list is used by functionreferent
to determine the referent of any vScheme name.It’s worth revisiting the disambiguation section of the handout on Unambiguous vScheme.
You can test your disambiguator by running
uft ho-ho
After lab: K-normal form
Defining K-normal form
Define K-normal form. In this step you define an internal representation (abstract-syntax tree) for K-normal form. Start by reading all about K-normal form.
Now edit module
KNormalForm
in fileknf.sml
. Redefine type'a exp
, which takes the representation of a name as a type parameter.1 In your definition, each of the metavariables in the K-normal form handout should be represented as follows:The atomic forms Metavariable Representation e
'a exp
x
'a
v
literal
@
vmop
Types
literal
andvmop
are already defined in the file; they are (respectively)ObjectCode.literal
andPrimitive.primitive
.If a type called
string
orname
appears anywhere in your representation, you are doing it wrong. The type of a name has to be unknown; it’s type parameter'a
.Before you move on to the next step, get a sanity check from a classmate or from a member of the course staff.
Implement the K-normal form utility functions. File
knf.sml
has placeholders for two functionssetglobal
andgetglobal
. Each is an abbreviation that creates an expression of the form@(x₁, ..., xₙ, v)
, where@
is eitherPrimitive.getglobal
orPrimitive.setglobal
. Forgetglobal
,n
is zero, and forsetglobal
,n
is one.The abbreviations aren’t strictly necessary, but they will save you some hassle down the road.
Embedding K-normal form into Scheme
Embed K-normal form into (ambiguous) vScheme. In file
knembed.sml
, you will find a template for two functions,value
anddef
, with these types:val value : KNormalForm.literal -> VScheme.value val def : VScheme.name KNormalForm.exp -> VScheme.def
We embed directly into (ambiguous)
VScheme
because our goal is to reuse the prettyprinter forVScheme
, and it’s actually easier than embedding untoUnambiguousVScheme
.Implement these functions.
The
value
function is actually a projection, not an embedding: the VM code supportsfloating-point values,string values andnil
, neither of which can be written as vScheme literals. Nonetheless, we’re going to treat it as if it were an embedding—we’re going to cheat.Embed a real value as its nearest integer, using functionReal.round
.Embed a string value as a symbol (lame, but the best we can do).
Embed
nil
as the Booleanfalse
.
The definition embedding uses only one definition form,
VScheme.EXP
. Internally, the main embedding should beval exp : VScheme.name KNormalForm.exp -> VScheme.exp
The embedding is described by equations for 𝓔 in the KNF handout. To implement those equations in ML code, you will use these value constructors:
K-normal form object language embeds in \(\mu\)Scheme using v x getglobal(STRING s) setglobal(x, STRING s) @(x₁, …, xₙ) @(x₁, …, xₙ, v) x(x₁, …, xₙ) if x then e₁ else e₂ let x = e in e’ (e₁; e₂) x := e while x := e do e’ and funcode (x₁, …, xₙ) => e Constructing vScheme
LET
forms is a bit of a nuisance, so take advantage of thelet'
helper function that I have provided for you.
The embedding has a few tricky cases:
The
@(x₁, ..., xₙ, v)
form has two special cases, where@
is eithergetglobal
andsetglobal
. Unfortunately, because@
has an abstract type (for good reasons), you cannot pattern match on it. Instead you have to pattern match on the result of calling functionP.name
.The
funcode
form should embed aslambda
even though they don’t have identical semantics.
One note: my prettyprinter condenses nested
let
expressions into Scheme’slet*
form, so if you see alet*
in your output, it is expected.
Projecting Scheme into K-normal form
Project unambiguous vScheme expressions into K-normal form. In file
knproject.sml
, you will find a template for two functions,value
anddef
, with these types:val value : UnambiguousVScheme.value -> KNormalForm.literal val def : UnambiguousVScheme.def -> string KNormalForm.exp Error.error
Internally you will also define
val exp : UnambiguousVScheme.exp -> string KNormalForm.exp Error.error
The projection will enable us to read K-normal form code from disk, using the Scheme lexer and parser, plus the disambiguator you wrote in step 10. We project from
UnambiguousVScheme
because it is easier than going direct fromVScheme
.The projection of expressions is described by equations for 𝓟 in the KNF handout. The left-hand sides that look ambiguous in those equations are exactly the ones that you disambiguated in step 10.
The equations don’t specify a projection of values. But every vScheme value can be represented as a K-normal form literal. And in fact, the
value
direction is actually an embedding, as suggested by its type. A vScheme symbol embeds as a K-normal form string, and the other forms of value embed one for one.Important: The projection functions simply change the representation of vScheme code that is already in K-normal form. You won’t translate general Scheme to K-normal form until module 9. For this module, if Scheme code is not already in K-normal form, the projection should fail for one of two reasons: either a form is outright unacceptable, or the form has a non-variable in a position where a variable is expected. The forms that are outright unacceptable are as follows:
Any
while
loop whose condition does not have the form(let ([x e]) y)
Any
while
loop whose condition has the form(let ([x e]) y)
but in which
x
is different fromy
Any
begin
not of the form(begin e₁ e₂)
Any
let
with more than one bindingAny
letrec
formAnylambda
form except for the special case of a global function definition, which should be handled by thedef
function
In step 18 you’ll write a test for each of these forms, so if you like, you can start writing those tests now.
Even a good form like
if
can be rejected if it doesn’t satisfy the invariants of K-normal form. The key invariant is that many forms are required to be names. One example is the condition in anif
expression: if the condition isn’t a name, then theif
expression isn’t in K-normal form. In this case the projection should fail. It is useful to define a helper function that insists on getting a name:val asName : X.exp -> X.name Error.error (* project into a name; used where KNF expects a name *) = fn X.LOCAL x => succeed x | e => error ("expected a local variable but instead got " ^ (X.whatIs e))
This function will be used to project
FUNCALL
andPRIMCALL
, which should be rejected unless every argument is a variable;IFX
, which should be rejected unless the condition is a variable; andSETGLOBAL
, which should be rejected unless the right-hand side is a variable. (AFUNCALL
will also be rejected unless the function being called is a variable.)Managing all the potential sources of error requires a lot of plumbing, but we can hide the details by using the same abstraction we used in the lexer and parser: an “applicative functor.” The code I give you includes suitable abbreviations:
infix 3 <*> val op <*> = Error.<*> infixr 4 <$> val op <$> = Error.<$> val succeed = Error.succeed val error = Error.ERROR val errorList = Error.list
Using these functions, the bureaucracy of error handling becomes manageable. For example, here’s my code for projecting
if
expressions:fun exp (X.IFX (e1, e2, e3)) = curry3 K.IF <$> asName e1 <*> exp e2 <*> exp e3
You are now ready to define the projection function for expressions.
Project unambiguous vScheme definitions into K-normal form. The last step is to project definitions. This step is easier because the
val
,check-expect
, andcheck-assert
forms simply aren’t in K-normal form; the projection fails. There are really only a couple of special cases, to do with functions.The main case is
X.EXP
; you just pass the payload to your internalexp
function. This case will cover 90% of your needs, and you can write it first.Although the general case is the one to write first, it has to follow this special case: A top-level expression of the form(let ([t (lambda (xs) e)]) (set f t))
should be projected into the K-normal formlet t = funcode xs => ⟦e⟧ in setglobal("f", t)
where⟦e⟧
stands for the projection ofe
(that is,⟦e⟧ = exp e
).One fine point: you have to match on(let ([t (lambda (xs) e)]) (set f t'))
and then insist thatt = t'
.To generate the K-normal formlet
expression, I define an internal helper functionfundef
of typestring KNormalForm.exp -> string KNormalForm.exp
. It takes⟦e⟧
as a parameter and returns a representation of the K-normal form expressionlet t = funcode xs => ⟦e⟧ in setglobal("f", t)
I then handle thelambda
case withfundef <$> exp e
Strictly speaking,
define
isn’t in K-normal form, but it can be handy to pretend. Try inventing at
, like sayval t = "$t1"
and then project
(define f (xs) e)
as
let t = funcode xs => ⟦e⟧ in setglobal("f", t)
You’re now ready to extend the UFT to handle your new functions.
Adding a pass to the Universal Forward Translator
Add a new pass to the UFT. Once you have defined your embedding, disambiguation, and projection functions, you can extend the UFT driver by adding support for K-normal form.
Begin with the handout on the UFT driver. It explains how the UFT driver works and what code has to be written to incorporate a new language.
Define reader function
KN_of_file
, which should work by composing the readerschemexOfFile
with the projection code you wrote in step 15. Its type should beval KN_of_file : instream -> string KNormalForm.exp list error
Define materializer function
KN_text_of
, which should materialize K-normal form. It should look almost exactly likeVS_of
: read the code from a file or bleat that there is no translation.Define emitter function
emitKN
by composingemitScheme
with your embedding function.Add a case to
translate
to handle the case whenoutLang
isKN
.
Testing
Preliminary testing. Your UFT should now be capable of running your embedding and projection. Test by running
uft kn-kn
Here are some cases for you to test:
(+ 2 2) ;; should be rejected (let* ([$r0 2] [$r1 2]) (+ $r0 $r1)) ;; should be OK (if (< n 0) #t #f) ;; reject (let* ([$r0 n] [$r1 0] [r2 (< $r0 $r1)]) (if $r0 #f #t)) ;; accept
Systematic testing.
Create a test file for each of the forms that should be rejected by the projection function in step 14, plus one more file that contains all the good forms.bad.while.scm
Condition in while
isn’t the rightlet
bad.whilevars.scm
In let
inwhile
condition, names don’t matchbad.begin.scm
begin
with wrong number of subexpressionsbad.let.scm
let
with wrong number of bindingsbad.letrec.scm
Any letrec
bad.lambda.scm
Any lambda
that is not a global definition. . good.scm
One each of every good KNF form (as embedded into Scheme) The
good.scm
should contain Scheme versions ofx
,v
,@(x₁, ..., xₙ)
,x(x₁, ..., xₙ)
,if x then e₁ else e₂
let x = e in e'
,(e₁; e₂)
,x := e
, andwhile x := e do e'
forms. It is ok if the “VM operation with literal” andfuncode
/lambda
forms are omitted.
18b. Round-trip testing
Testing success and failure in step 17 is likely sufficient. But if you want to test semantics, you can do it by comparing round-trip results from your UFT/SVM combo with result from the
vscheme
interpreter.For example, if
test.scm
evaluates abegin
, a couple oflet
bindings, and a few primitives, it might look like this:(begin (let* ([tmp 2] [tmp (+ tmp tmp)]) (check tmp 'two-plus-two)) (let ([tmp 4]) (expect tmp 'four)))
This code can be run through
vscheme
without using embedding and projection, and then again with embedding and projection:$ cat test.scm | vscheme The only test passed. $ cat test.scm | uft kn-kn | vscheme The only test passed.
The call to
uft kn-kn
puts my code through embedding and projection, which doesn’t change the test results.
What and how to submit
On Monday, submit the homework. In the
src/uft
directory you’ll find a fileSUBMIT.05
. That file needs to be edited to answer the same questions you answer every week.To submit, you’ll need to copy your working tree to the department servers. We recommend using
rsync
, butscp
also works.Now log into a department server, change to your working tree, and submit your entire
src
directory:provide cs106 hw05 src
On Tuesday, submit your reflection. Create a plain text file
REFLECTION
, which will hold your claims for project points and depth points.For each project point you claim, write the number of the point, plus whatever is called for in the section “How to claim the project points”—usually a few sentences.
Now copy your
REFLECTION
file to a department server and submit it usingprovide
:provide cs106 reflection05 REFLECTION
Reading in depth
Occasionally I’ll suggest reading that may enrich your view of programming-language implementation.
Normal forms. K-normal form is defined by Birkedal, Tofte, and Veilstrup (1996), who use it to infer properties about memory use. It is based on the more famous A-normal form of Flanagan et al. (1993), which is definitely worth reading.
Prettyprinting. My prettyprinter uses an interface designed by Wadler (1998), who pitches his design as an improved version of the original by Hughes (1995). Both papers are well worth reading.
My code uses only Wadler’s interface; my back end is a clean-room implementation of an algorithm by Pugh and Sinofsky (1987). If you like that sort of thing, it’s a nice application of dynamic programming.
Learning outcomes
Outcomes available for points
You can claim a project point for each of the learning outcomes listed here. Instructions about how to claim each point are found below.
Global and local names. You can create a term in which a single name appears as both a global variable and a local variable.
K-normal form. You can write simple K-normal form by hand.
K-normal form embedding. You can, by hand, embed simple K-normal form into Scheme.
Names in K-normal form. You can say which names in the source code show up as what types in K-normal form.
Embedding, projection, and language design. You can justify the fact that K-normal form has fewer expressions but more literals than Scheme.
Eta-expansion. You can say which of the K-normal form invariants is satisfied by the body of an eta-expanded primitive.
UFT driver. Your
uft
builds and understands whatkn-kn
is asking for.Testing. You can explain the results of your tests.
Notation. You can use Oxford brackets to write translation equations.
You can claim a depth point for floating let
out of let
, and two more for improving the Scheme prettyprinter.
Let-floating transformation [1 point]. When variable
y
is chosen so it isdistinct fromthe same asx
andx
or not free ine₃
, the following expressions are equivalent:let x = (let y = e₁ in e₂) in e₃ let y = e₁ in (let x = e₂ in e₃)
But the second expression results in code that is easier to read and that runs faster on some platforms (Peyton Jones, Partain, and Santos 1996). Implement let-floating on K-normal form. See what sort of difference it makes to the generated VM code.
Prettyprinting [2 points]. The indentation and line breaks for the vScheme prettyprinter are just barely tolerable. Improve them.
How to claim the project points
Each of the numbered learning outcomes 1 to N is worth one point, and the points can be claimed as follows:
To claim this point, write an expression in vScheme in which
x
appears both as aGLOBAL
variable and as aLOCAL
variable.To claim this point, write an expression, using the ML-like syntax of K-normal form from the handout on K-normal form, that corresponds to the ML expression
n + 1
, wheren
is a local variable.To claim this point, embed the previous expression into valid vScheme. That is, write an expression in vScheme that is the embedding of an expression in K-normal form; the expression that is embedded must correspond to the Scheme expression
(+ n 1)
, wheren
is a local variable.You must understand all the relevant categories of the ways names can be used in Scheme: formal parameter, local variable, global variable, user-defined function, and primitive function. And in your ML representation of K-normal form, you must understand the use of each of these types:
- Type
'a
- Type
vmop
- Type
literal
To claim the point, for each of the three types listed, say what categories of Scheme names are represented by values of that type.
- Type
Observe that expressions in K-normal form are a subset of vScheme expressions, but literals in K-normal form are a superset of (Unambiguous) Scheme values. It seems strange to have the relation point in opposite directions. To claim this point, answer these two questions:
In a system that is targeting the SVM but does not necessarily want to be locked into translating Scheme, why is it a good idea to have K-normal form expressions be a subset of Scheme expressions?
In a system that is targeting the SVM but does not necessarily want to be locked into translating Scheme, why is it a good idea to have K-normal form literals be a superset of Scheme values?
To claim this point, analyze the eta-expansions produced by function
etaExpand
in filedisambiguate.sml
. This function returns alambda
written in Disambiguated vScheme. Analyze the body of thelambda
and say which of the K-normal-form invariants it satisfies and why.To claim this point, submit code that compiles and runs so that
uft kn-kn
produces a sensible result, better than “I don’t know how to translate.” This point is awarded for running the translation; you get the point even if one or more of the functions have bugs.To claim this point, say in a few short sentences what your tests tell you about what parts of your code do and don’t work. This point is awarded for understanding the results of your tests; you get the point even if your UFT does not yet behave the way you hope.
To demonstrate the Oxford brackets, you should be able to specify a key element of a translation you already know well: the translation from assembly language to object code that you implemented you implemented in the previous module. This translation is specified by a function
\(\mathcal A\) :
AssemblyCode.instr -> int -> (name -> int) -> ObjectCode.instr
The parameter of type
int
is the position that the instruction occupies in the instruction stream. The parameter of typename -> int
is the environment \(\rho\); it is the mathematician’s way of writing an environment of typeint Env.env
.To demonstrate ability with Oxford brackets, it is sufficient to write an equation that describes the translation of just one instruction: the
GOTO
instruction. When function \(\mathcal A\) is given an assembly-languageGOTO
with a label, it turns an object-languageGOTO
with a PC-relative offset. To claim the point, use Oxford brackets to write an equation that describes just the translation of theGOTO
instruction. Notate the position parameter as \(k\) and the environment parameter as \(\rho\).