Here we start our long journey from the machine level to the high level. “We can so have nice things!”

The nice thing we’d ultimately like to have is a typical scripting language: one that’s dynamically typed and that offers both functional and imperative features. From there we can consider adding fancy bits, like objects or a type system.

Ideas of K-normal form

Our first step is called K-normal form. Its key ideas are as follows:

K-normal form is a language of expressions that’s going to look like a subset of Scheme: a typical expression is evaluated for a value, but some expressions may also be evaluated for a side effect.
K-normal form includes two forms of function call: an ordinary form, which compiles to a call instruction (to be implemented in module 7), and a special primitive form, which compiles to whatever VM instruction goes with the primitive.
In K-normal form, expressions and calls are not nested. VIEWED AS CONFUSING; FIX ME A function (ordinary or primitive) can be called only if it is named, and all the arguments of a call are named.¹
Names are different. In K-normal form, a name always refers to a local variable bound with let, or to a parameter. A global vScheme variable is handled by writing the variable’s name as a literal value, which is then passed as a parameter to primitive setglobal or getglobal. In the SVM loader, each value is converted into an index, which refers to a slot on the VM’s global-variable table.
K-normal form is shockingly close to machine code. Just replace each name with a machine register, and it’s almost assembly code.
There are no definition forms! A definition is coded as an assignment to a global variable.

Why is K-normal form a language of expressions, not statements or instructions? Because our compiler (UFT) is going to translate between programs using the same “substitute equals for equals” techniques we used to do calculational proofs in CS 105. These techniques enable us to avoid all the fancy compiler stuff that goes under the heading of “dataflow analysis.”

The syntactic structure of K-normal form

The syntax of K-normal form refers to the usual names, expressions, and literal values, but it also refers to virtual-machine operators. A virtual-machine operator, written using metavariable @, is a name that looks like a function, but an application of a virtual-machine operator does not compile to a function call; it compiles to a single instruction on the SVM.

Metavariables of K-normal form
`e`	Expression
`x`	Name
`v`	Literal (number, string, Boolean, and so on)
`@`	Virtual-machine operator (might be `+` or `cons`, for example)

K-normal form is a language of expressions—effectively a subset of vScheme—which satisfies some invariants and restrictions that keep the syntactic structure simple enough to generate code from without requiring a ton of semantic analysis:

Every argument to every function call and VM operator has a name—except in one special case, where one argument may be a literal value.
The condition in an if expression or while loop has a name.
There is only one form of let, and it binds exactly one variable.²
The set and variable forms refer only to local variables.³ Global variables are read and written by VM operators getglobal and setglobal.
The begin form is restricted to exactly two expressions.
The funcode form looks like a lambda, but it isn’t: a funcode may not have any free names! (That is, all its free variables are global variables.)

The syntactic structure of K-normal form is so central that I present it in two notations:

An ML-like notation, which is easy to read, write, and understand
A Scheme-like notation, which is easy to relate to Scheme

The ML-like notation looks like this:⁴

e  ::= v                           -- literal value
    |  x                           -- name
    |  @(x₁, ..., xₙ)              -- VM operation
    |  @(x₁, ..., xₙ, v)           -- VM operation with one literal
    |  x(x₁, ..., xₙ)              -- function call
    |  if x then e₁ else e₂        
    |  let x = e in e'             
                                   
    |  (e₁; e₂)                    -- sequence
    |  x := e                      -- assignment to local variable
    |  while x := e do e'          -- loop with named condition

    |  funcode (x₁, ... xₙ) => e   -- can't have free names

The “VM operation with one literal” form is going to be useful for VM instructions that use the R1LIT parser. Depending on how you’ve designed your SVM, those instructions might include check, expect, and maybe also setglobal and getglobal.

Examples of the ML-like notation might look like this:

Source code: if n < 0 then negated n else n

K-normal form:

let t1 = 0 in
let t2 = <(n, t1) in
if t2 then
  let t3 = getglobal("negated")
  in  t3(n)
else
  n

Source code: x :: revAppend (xs, ys)

K-normal form:

let t1 = getglobal("revAppend") in
let t2 = t1(xs, ys) in
cons(x, t2)

The Scheme-like notation looks like this:

e  ::= v                            -- literal value
    |  x                            -- name
    |  (@ x₁ ... xₙ)                -- VM operation
    |  (@ x₁ ... xₙ v)              -- VM operation with one literal
    |  (x x₁ ... xₙ)                -- function call
    |  (if x e₁ e₂)
    |  (let ([x e]) e')

    |  (begin e₁ e₂)
    |  (set x e)
    |  (while (let ([x e]) x) e')   -- loop with named condition

    |  (funcode (x₁ ... xₙ) e)      -- can't have free names

To make the while form truly a subset of Scheme, I have to duplicate the x. But like the ML notation while x := e do e', your internal representation should have just one copy of x.

Examples of the Scheme-like notation might look like this:

Scheme source code: (if (< n 0) (negated n) n)

K-normal form, assuming n is a local variable:
```
(let ([t1 0])
  (let ([t2 (< n t1)])
    (if t2
        (let ([t3 (getglobal 'negated)])
          (t3 n))
        n)))
```
Equivalent vScheme:

The concrete syntax above looks like valid vScheme, but it isn’t: vScheme doesn’t have a getglobal operation. In vScheme, the way to refer to a global variable is just to use its name. It is also conventional to write nested let expressions using an equivalent let*. So the example shown above could be projected from a vScheme expression like this:
```
(let* ([t1 0]
       [t2 (< n t1)])
  (if t2
      (let ([t3 negated])
        (t3 n))
      n))
```
To test this expression, it would have to be wrapped in code that binds n locally, like (let ([n ’uninitialized]) …).
Scheme source code: (cons x (append xs ys))

K-normal form, allowing let*, and assuming x, xs, and ys are local variables
```
(let* ([t1 (getglobal 'append)]
       [t2 (t1 xs ys)])
  (cons x t2))
```
Equivalent vScheme:
```
(let* ([t1 append]
       [t2 (t1 xs ys)])
  (cons x t2))
```

Relationship between vScheme and K-normal form

K-normal form corresponds to a first-order subset of vScheme. Here’s a rough correspondence (the meaning of the colors is explained below):

vScheme AST	K-normal form
`LITERAL`	`v`
`VAR`	`x` or `@` or `getglobal` (you disambiguate in lab)
`SET`	`set` or `setglobal`
`IFX`	`if`
`WHILE`	`(while (let ...) ...)`
`APPLY`	`@(...)` or `x(...)` (you disambiguate in lab)
`LETX`	`let`

`VAL`	`setglobal`
`DEFINE`	`setglobal` and `funcode`

`LAMBDA`	`funcode`, provided all free variables are global variables

Using this correspondence, it should be easy to embed a K-normal into μScheme code—that means, change the representation of the code from K-normal form to μScheme. Projecting μScheme as a K-normal form, however, requires more information: to know what K-normal form corresponds to VAR, SET, or APPLY, each name and call in the source code has to be disambiguated. That will be done in lab. The embedding and projections themselves are a major part of your homework.

Terminology and notations for embedding and projection

Embedding and projection, like many other translations, can be expressed using recursion equations. Writing such a translation is challenging because there are three languages in play:

The language in which the translator itself is implemented is the metalanguage. For us, the metalanguage is likely to be Standard ML. And when a translation is specified in a book, a paper, or a handout, the metalanguage is typically mathematics.
The language being translated from is the source language.
The language being translated to is the target language.

Both source and target languages are object languages; “object language” is the word for a language that is manipulated by a translator. In many translations, there is just one object language; the source language and target language are the same. Such translations are often called “optimizations.”

Having all these languages presents a notational problem: how to tell them apart. I use a notational trick commonly used in programming languages: to quote any term of an object language, I give it a colored background, and I wrap the term in “Oxford brackets”: ⟦...⟧ for a source-language term and ⟦...⟧ for a target-language term. For example, if I want to say how to translate an if expression from K-normal form into vScheme, I’ll quote the if expression as ⟦if x then e₁ else e₂⟧.⁵ The Oxford brackets are not that different from the double quote marks you are used to for writing string literals, and they can be used for both source and target languages.

The metalanguage is not supposed to be quoted within Oxford brackets. But on rare occasions we want to put metalanguage code inside the Oxford brackets. This exception requires some sort of notation to “escape” the act of being quoted. I use $(...), where the ... is code in the metalanguage. This notation, which is borrowed from the POSIX shell language, is a form of antiquotation. The existence of the antiquotation technically makes the Oxford brackets a “quasiquotation,” which means “quotation that can be escaped.”

Quasiquotation isn’t just for theory; it is supported by a number of programming languages. Lisp was the first; all Lisp dialects allow you to quasiquote S-expressions. Standard ML of New Jersey includes a somewhat rudimentary mechanism. But the really nice quasiquoter is the one for Haskell designed and implemented by Geoff Mainland. It has been used effectively in all sorts of language implementations.

Equations of embedding and projection

K-normal form is embedded into Scheme by a function 𝓔 (for “embed”) that satisfies the laws below. In each law, the left-hand side is metalanguage (with quoted object-language terms), and the right-hand side is an unholy mix of object language and metalanguage.⁶

K-normal form	= Embedding into Scheme
𝓔⟦v⟧	= … not written here …
𝓔⟦x⟧	= x
𝓔⟦if x then e₁ else e₂⟧	= (if x 𝓔⟦e₁⟧ 𝓔⟦e₂⟧)
𝓔⟦let x = e in e’⟧	= (let ([x 𝓔⟦e⟧]) 𝓔⟦e’⟧)
𝓔⟦(setglobal(x, X); x)⟧	= (set X x)
𝓔⟦(e₁; e₂)⟧	= (begin 𝓔⟦e₁⟧ 𝓔⟦e₂⟧)
𝓔⟦(x := e)⟧	= (set x 𝓔⟦e⟧)
𝓔⟦while x := e do e’⟧	= (while (let ([x 𝓔⟦e⟧]) x) 𝓔⟦e’⟧)
𝓔⟦funcode (x₁, … xₙ) => e⟧	= (lambda (x₁ ⋯ xₙ) => 𝓔⟦e⟧)

The VM-operation form has some special cases, and it requires some antiquotation:

K-normal form	= Embedding into Scheme
𝓔⟦getglobal(v)⟧	= $(nameFrom v)
𝓔⟦setglobal(x, v)⟧	= (set $(nameFrom v) x)
𝓔⟦@(x₁, …, xₙ)⟧	= ($(Primitive.name @) x₁ ⋯ xₙ)
𝓔⟦@(x₁, …, xₙ, v)⟧	= ($(Primitive.name @) x₁ ⋯ xₙ 𝓔⟦v⟧)

Function nameFrom converts a string literal to a name; on other literals, it is not defined. The embedding of values, 𝓔⟦v⟧, is not written down here; it’s what you’ll implement in function EmbedKN.value.

And to render embedded code easy to read, I recommend adding this special case:

𝓔⟦let x = e in x⟧ = 𝓔⟦e⟧

The special case is useful because the code on the left is what your compiler will have to generate for a call to an unknown function.

Projection 𝓟 takes things in the other direction. It is very much a partial function; when given a Scheme form that doesn’t match any left-hand side, 𝓟 fails.

Scheme	= Projection into K-normal form
𝓟⟦v⟧	= … not written here …
𝓟⟦x⟧	= x, if x is local
𝓟⟦x⟧	= getglobal($(STRING x)), if x is global
𝓟⟦(if x e₁ e₂)⟧	= if x then 𝓟⟦e₁⟧ else 𝓟⟦e₂⟧
𝓟⟦(let ([x e]) e’)⟧	= let x = 𝓟⟦e⟧ in 𝓟⟦e’⟧
𝓟⟦(begin e₁ e₂)⟧	= (𝓟⟦e₁⟧; 𝓟⟦e₂⟧)
𝓟⟦(set x e)⟧	= (x := 𝓟⟦e⟧), if x is local
𝓟⟦(set x x’)⟧	= (setglobal(x’, $(STRING x)); x’), if x is global
𝓟⟦(x x₁ ⋯ xₙ)⟧	= @(x₁, …, xₙ),
	where x names primitive function @ and @ is not check or expect
𝓟⟦(x x₁ v)⟧	= @(x₁, v), where x names primitive check or expect
𝓟⟦(x x₁ ⋯ xₙ)⟧	= x(x₁, …, xₙ), otherwise
𝓟⟦(while (let ([x e]) x) e’)⟧	= while x := 𝓟⟦e⟧ do 𝓟⟦e’⟧
𝓟⟦(lambda (x₁ ⋯ xₙ) => e)⟧	= funcode (x₁, … xₙ) => 𝓟⟦e⟧,
	provided all the free variables of the lambda are global

On the left-hand sides, notice the two x forms, two set forms,⁷ two function-application forms. These are exactly the forms that are disambiguated in lab—so your projection will be able to tell them apart by pattern matching on the syntax.

Also note the side condition on lambda. A Scheme lambda projects to a K-normal form funcode only if neither of them has any free variables—or more precisely, all the free variables are global variables. Your implementation can skip this check for now.⁸

Semantics of funcode

The funcode form requires more explanation. From a semantic point of view, there are two ways to handle it: either it behaves like a classic Lisp lambda from 1958, or it behaves like a closure that captures only global variables. The latter form is what we want, and it is specified by the evaluation rule

\[\frac{}{\langle \mathtt{funcode}\; \mathit{xs}\; \mathtt{=>}\; e, \rho, \sigma \rangle \Downarrow \langle ⦇ \lambda \mathit{xs} . e, \rho_0 ⦈, \sigma \rangle}\text,\]

where $\rho_0$ is the initial environment, i.e., the one mapping every global variable to its location in the global-variable table.

With one exception, which is a performance cheat.↩
In module 10 we will add a recursive let form.↩
Function parameters are also considered local variables.↩
If your browser is not rendering all the subscripts, it might help to install the Deja Vu fonts.↩
The Oxford brackets aren’t pronounced. If we must call attention to the notation, we say that the if expression is “quoted.”↩
Technically, a right-hand side should be notated by wrapping the whole thing in Oxford brackets and antiquoting the recursive call to 𝓔. But the result would be an unreadable mess, so nobody does it.↩
The SETGLOBAL form is executed for both side effect and value, so it projects as a two-expression sequence: the setglobal primitive for side effect, and the register (from which the global is set) for value.↩
In module 10, where we actually implement lambda, free variables are far more relevant—and you’ll implement a function that finds them. Until then, your projection function can take it on faith that a lambda has no free variables.↩