Here we start our long journey from the machine level to the high level. “We can so have nice things!”
The nice thing we’d ultimately like to have is a typical scripting language: one that’s dynamically typed and that offers both functional and imperative features. From there we can consider adding fancy bits, like objects or a type system.
Ideas of Knormal form
Our first step is called Knormal form. Its key ideas are as follows:
Knormal form is a language of expressions that’s going to look like a subset of Scheme: a typical expression is evaluated for a value, but some expressions may also be evaluated for a side effect.
Knormal form includes two forms of function call: an ordinary form, which compiles to a
call
instruction (to be implemented in module 7), and a special primitive form, which compiles to whatever VM instruction goes with the primitive.In Knormal form, expressions and calls are not nested. VIEWED AS CONFUSING; FIX ME A function (ordinary or primitive) can be called only if it is named, and all the arguments of a call are named.^{1}
Names are different. In Knormal form, a name always refers to a local variable bound with
let
, or to a parameter. A global vScheme variable is handled by writing the variable’s name as a literal value, which is then passed as a parameter to primitivesetglobal
orgetglobal
. In the SVM loader, each value is converted into an index, which refers to a slot on the VM’s globalvariable table.Knormal form is shockingly close to machine code. Just replace each name with a machine register, and it’s almost assembly code.
There are no definition forms! A definition is coded as an assignment to a global variable.
Why is Knormal form a language of expressions, not statements or instructions? Because our compiler (UFT) is going to translate between programs using the same “substitute equals for equals” techniques we used to do calculational proofs in CS 105. These techniques enable us to avoid all the fancy compiler stuff that goes under the heading of “dataflow analysis.”
The syntactic structure of Knormal form
The syntax of Knormal form refers to the usual names, expressions, and literal values, but it also refers to virtualmachine operators. A virtualmachine operator, written using metavariable @
, is a name that looks like a function, but an application of a virtualmachine operator does not compile to a function call; it compiles to a single instruction on the SVM.
e 
Expression 
x 
Name 
v 
Literal (number, string, Boolean, and so on) 
@ 
Virtualmachine operator (might be + or cons , for example) 
Knormal form is a language of expressions—effectively a subset of vScheme—which satisfies some invariants and restrictions that keep the syntactic structure simple enough to generate code from without requiring a ton of semantic analysis:
Every argument to every function call and VM operator has a name—except in one special case, where one argument may be a literal value.
The condition in an
if
expression orwhile
loop has a name.There is only one form of
let
, and it binds exactly one variable.^{2}The
set
and variable forms refer only to local variables.^{3} Global variables are read and written by VM operatorsgetglobal
andsetglobal
.The
begin
form is restricted to exactly two expressions.The
funcode
form looks like alambda
, but it isn’t: afuncode
may not have any free names! (That is, all its free variables are global variables.)
The syntactic structure of Knormal form is so central that I present it in two notations:
 An MLlike notation, which is easy to read, write, and understand
 A Schemelike notation, which is easy to relate to Scheme
The MLlike notation looks like this:^{4}
e ::= v  literal value
 x  name
 @(x₁, ..., xₙ)  VM operation
 @(x₁, ..., xₙ, v)  VM operation with one literal
 x(x₁, ..., xₙ)  function call
 if x then e₁ else e₂
 let x = e in e'
 (e₁; e₂)  sequence
 x := e  assignment to local variable
 while x := e do e'  loop with named condition
 funcode (x₁, ... xₙ) => e  can't have free names
The “VM operation with one literal” form is going to be useful for VM instructions that use the R1LIT
parser. Depending on how you’ve designed your SVM, those instructions might include check
, expect
, and maybe also setglobal
and getglobal
.
Examples of the MLlike notation might look like this:
Source code:
if n < 0 then negated n else n
Knormal form:
let t1 = 0 in let t2 = <(n, t1) in if t2 then let t3 = getglobal("negated") in t3(n) else n
Source code:
x :: revAppend (xs, ys)
Knormal form:
let t1 = getglobal("revAppend") in let t2 = t1(xs, ys) in cons(x, t2)
The Schemelike notation looks like this:
e ::= v  literal value
 x  name
 (@ x₁ ... xₙ)  VM operation
 (@ x₁ ... xₙ v)  VM operation with one literal
 (x x₁ ... xₙ)  function call
 (if x e₁ e₂)
 (let ([x e]) e')
 (begin e₁ e₂)
 (set x e)
 (while (let ([x e]) x) e')  loop with named condition
 (funcode (x₁ ... xₙ) e)  can't have free names
To make the while
form truly a subset of Scheme, I have to duplicate the x
. But like the ML notation while x := e do e'
, your internal representation should have just one copy of x
.
Examples of the Schemelike notation might look like this:
Scheme source code: (if (< n 0) (negated n) n)
Knormal form, assuming n is a local variable:
(let ([t1 0]) (let ([t2 (< n t1)]) (if t2 (let ([t3 (getglobal 'negated)]) (t3 n)) n)))
Equivalent vScheme:
The concrete syntax above looks like valid vScheme, but it isn’t: vScheme doesn’t have a getglobal operation. In vScheme, the way to refer to a global variable is just to use its name. It is also conventional to write nested
let
expressions using an equivalentlet*
. So the example shown above could be projected from a vScheme expression like this:(let* ([t1 0] [t2 (< n t1)]) (if t2 (let ([t3 negated]) (t3 n)) n))
To test this expression, it would have to be wrapped in code that binds
n
locally, like (let ([n ’uninitialized]) …).Scheme source code: (cons x (append xs ys))
Knormal form, allowing let*, and assuming x, xs, and ys are local variables
(let* ([t1 (getglobal 'append)] [t2 (t1 xs ys)]) (cons x t2))
Equivalent vScheme:
(let* ([t1 append] [t2 (t1 xs ys)]) (cons x t2))
Relationship between vScheme and Knormal form
Knormal form corresponds to a firstorder subset of vScheme. Here’s a rough correspondence (the meaning of the colors is explained below):
vScheme AST  Knormal form 

v 

x or @ or getglobal (you disambiguate in lab) 

set or setglobal 

if 

(while (let ...) ...) 

@(...) or x(...) (you disambiguate in lab) 

let 

setglobal 

setglobal and funcode 

funcode , provided all free variables are global variables 
Using this correspondence, it should be easy to embed a Knormal into μScheme code—that means, change the representation of the code from Knormal form to μScheme. Projecting μScheme as a Knormal form, however, requires more information: to know what Knormal form corresponds to VAR
, SET
, or APPLY
, each name and call in the source code has to be disambiguated. That will be done in lab. The embedding and projections themselves are a major part of your homework.
Terminology and notations for embedding and projection
Embedding and projection, like many other translations, can be expressed using recursion equations. Writing such a translation is challenging because there are three languages in play:
The language in which the translator itself is implemented is the
. For us, the metalanguage is likely to be Standard ML. And when a translation is specified in a book, a paper, or a handout, the metalanguage is typically mathematics.The language being translated from is the source language.
The language being translated to is the target language.
Both source and target languages are object languages; “object language” is the word for a language that is manipulated by a translator. In many translations, there is just one object language; the source language and target language are the same. Such translations are often called “optimizations.”
Having all these languages presents a notational problem: how to tell them apart. I use a notational trick commonly used in programming languages: to quote any term of an object language, I give it a colored background, and I wrap the term in “Oxford brackets”: ⟦...⟧
for a sourcelanguage term and ⟦...⟧
for a targetlanguage term. For example, if I want to say how to translate an if
expression from Knormal form into vScheme, I’ll quote the if
expression as ⟦if x then e₁ else e₂⟧
.^{5} The Oxford brackets are not that different from the double quote marks you are used to for writing string literals, and they can be used for both source and target languages.
The metalanguage is not supposed to be quoted within Oxford brackets. But on rare occasions we want to put metalanguage code inside the Oxford brackets. This exception requires some sort of notation to “escape” the act of being quoted. I use $(, where the ) is code in the metalanguage. This notation, which is borrowed from the POSIX shell language, is a form of antiquotation. The existence of the antiquotation technically makes the Oxford brackets a “quasiquotation,” which means “quotation that can be escaped.”
Quasiquotation isn’t just for theory; it is supported by a number of programming languages. Lisp was the first; all Lisp dialects allow you to quasiquote Sexpressions. Standard ML of New Jersey includes a somewhat rudimentary mechanism. But the really nice quasiquoter is the one for Haskell designed and implemented by Geoff Mainland. It has been used effectively in all sorts of language implementations.
Equations of embedding and projection
Knormal form is embedded into Scheme by a function 𝓔 (for “embed”) that satisfies the laws below. In each law, the lefthand side is metalanguage (with quoted objectlanguage terms), and the righthand side is an unholy mix of object language and metalanguage.^{6}
Knormal form  = Embedding into Scheme 

𝓔⟦v⟧  = … not written here … 
𝓔⟦x⟧  = x 
𝓔⟦if x then e₁ else e₂⟧  = (if x 𝓔⟦e₁⟧ 𝓔⟦e₂⟧) 
𝓔⟦let x = e in e’⟧  = (let ([x 𝓔⟦e⟧]) 𝓔⟦e’⟧) 
𝓔⟦(setglobal(x, X); x)⟧  = (set X x) 
𝓔⟦(e₁; e₂)⟧  = (begin 𝓔⟦e₁⟧ 𝓔⟦e₂⟧) 
𝓔⟦(x := e)⟧  = (set x 𝓔⟦e⟧) 
𝓔⟦while x := e do e’⟧  = (while (let ([x 𝓔⟦e⟧]) x) 𝓔⟦e’⟧) 
𝓔⟦funcode (x₁, … xₙ) => e⟧  = (lambda (x₁ ⋯ xₙ) => 𝓔⟦e⟧) 
The VMoperation form has some special cases, and it requires some antiquotation:
Knormal form  = Embedding into Scheme 

𝓔⟦getglobal(v)⟧  = $( ) 
𝓔⟦setglobal(x, v)⟧  = (set $( ) x) 
𝓔⟦@(x₁, …, xₙ)⟧  = ($( ) x₁ ⋯ xₙ) 
𝓔⟦@(x₁, …, xₙ, v)⟧  = ($(v⟧) ) x₁ ⋯ xₙ 𝓔⟦ 
Function 𝓔⟦v⟧, is not written down here; it’s what you’ll implement in function .
converts a string literal to a name; on other literals, it is not defined. The embedding of values,And to render embedded code easy to read, I recommend adding this special case:
𝓔⟦let x = e in x⟧ = 𝓔⟦e⟧
The special case is useful because the code on the left is what your compiler will have to generate for a call to an unknown function.
Projection 𝓟 takes things in the other direction. It is very much a partial function; when given a Scheme form that doesn’t match any lefthand side, 𝓟 fails.
Scheme  = Projection into Knormal form 

𝓟⟦v⟧  = … not written here … 
𝓟⟦x⟧  = x, if x is local 
𝓟⟦x⟧  = getglobal($(, if ))x is global 
𝓟⟦(if x e₁ e₂)⟧  = if x then 𝓟⟦e₁⟧ else 𝓟⟦e₂⟧ 
𝓟⟦(let ([x e]) e’)⟧  = let x = 𝓟⟦e⟧ in 𝓟⟦e’⟧ 
𝓟⟦(begin e₁ e₂)⟧  = (𝓟⟦e₁⟧; 𝓟⟦e₂⟧) 
𝓟⟦(set x e)⟧  = (x := 𝓟⟦e⟧), if x is local 
𝓟⟦(set x x’)⟧  = (setglobal(x’, $(, if )); x’)x is global 
𝓟⟦(x x₁ ⋯ xₙ)⟧  = @(x₁, …, xₙ), 
where x names primitive function @ and @ is not check or expect  
𝓟⟦(x x₁ v)⟧  = @(x₁, v), where x names primitive check or expect 
𝓟⟦(x x₁ ⋯ xₙ)⟧  = x(x₁, …, xₙ), otherwise 
𝓟⟦(while (let ([x e]) x) e’)⟧  = while x := 𝓟⟦e⟧ do 𝓟⟦e’⟧ 
𝓟⟦(lambda (x₁ ⋯ xₙ) => e)⟧  = funcode (x₁, … xₙ) => 𝓟⟦e⟧, 
provided all the free variables of the lambda are global 
On the lefthand sides, notice the two x
forms, two set
forms,^{7} two functionapplication forms. These are exactly the forms that are disambiguated in lab—so your projection will be able to tell them apart by pattern matching on the syntax.
Also note the side condition on lambda. A Scheme lambda projects to a Knormal form funcode only if neither of them has any free variables—or more precisely, all the free variables are global variables. Your implementation can skip this check for now.^{8}
Semantics of funcode
The funcode form requires more explanation. From a semantic point of view, there are two ways to handle it: either it behaves like a classic Lisp lambda
from 1958, or it behaves like a closure that captures only global variables. The latter form is what we want, and it is specified by the evaluation rule
\[\frac{}{\langle \mathtt{funcode}\; \mathit{xs}\; \mathtt{=>}\; e, \rho, \sigma \rangle \Downarrow \langle ⦇ \lambda \mathit{xs} . e, \rho_0 ⦈, \sigma \rangle}\text,\]
where \(\rho_0\) is the initial environment, i.e., the one mapping every global variable to its location in the globalvariable table.