K-Normal Form

add a SETGLOBAL form to KNF

Consider these two vmop forms (but TBH, the benefit looks marginal). Just alert students to the option; don’t necessarily recommend it.

|  @(x₁, ..., xₙ)              -- VM operation
|  @(x₁, ..., xₙ, v)           -- VM operation with one literal

It's possible to represent both forms using a single value constructor that carries a `literal option`, but the benefits look marginal.

Here we start our long journey from the machine level to the high level. “We can so have nice things!”

The nice thing we’d ultimately like to have is a typical scripting language: one that’s dynamically typed and that offers both functional and imperative features. From there we can consider adding fancy bits, like objects or a type system.

Ideas of K-normal form

Our first step is called K-normal form. Its key ideas are as follows:

Why is K-normal form a language of expressions, not statements or instructions? Because our compiler (UFT) is going to translate between programs using the same “substitute equals for equals” techniques we used to do calculational proofs in CS 105. These techniques enable us to avoid all the fancy compiler stuff that goes under the heading of “dataflow analysis.”

The syntactic structure of K-normal form

The syntax of K-normal form refers to the usual names, expressions, and literal values, but it also refers to virtual-machine operators. A virtual-machine operator, written using metavariable @, is a name that looks like a function, but an application of a virtual-machine operator does not compile to a function call; it compiles to a single instruction on the SVM.

Metavariables of K-normal form
e Expression
x Name
v Literal (number, string, Boolean, and so on)
@ Virtual-machine operator (might be + or cons, for example)

K-normal form is a language of expressions—effectively a subset of vScheme—which satisfies some invariants and restrictions that keep the syntactic structure simple enough to generate code from without requiring a ton of semantic analysis:

The syntactic structure of K-normal form is so central that I present it in two notations:

The ML-like notation looks like this:4

e  ::= v                           -- literal value
    |  x                           -- name
    |  @(x₁, ..., xₙ)              -- VM operation
    |  @(x₁, ..., xₙ, v)           -- VM operation with one literal
    |  x(x₁, ..., xₙ)              -- function call
    |  if x then e₁ else e₂        
    |  let x = e in e'             
                                   
    |  (e₁; e₂)                    -- sequence
    |  x := e                      -- assignment to local variable
    |  while x := e do e'          -- loop with named condition

    |  funcode (x₁, ... xₙ) => e   -- can't have free names

The “VM operation with one literal” form is going to be useful for VM instructions that use the R1LIT parser. Depending on how you’ve designed your SVM, those instructions might include check, expect, and maybe also setglobal and getglobal.

Examples of the ML-like notation might look like this:

The Scheme-like notation looks like this:

e  ::= v                            -- literal value
    |  x                            -- name
    |  (@ x₁ ... xₙ)                -- VM operation
    |  (@ x₁ ... xₙ v)              -- VM operation with one literal
    |  (x x₁ ... xₙ)                -- function call
    |  (if x e₁ e₂)
    |  (let ([x e]) e')

    |  (begin e₁ e₂)
    |  (set x e)
    |  (while (let ([x e]) x) e')   -- loop with named condition

    |  (funcode (x₁ ... xₙ) e)      -- can't have free names

To make the while form truly a subset of Scheme, I have to duplicate the x. But like the ML notation while x := e do e', your internal representation should have just one copy of x.

Examples of the Scheme-like notation might look like this:

Relationship between vScheme and K-normal form

K-normal form corresponds to a first-order subset of vScheme. Here’s a rough correspondence (the meaning of the colors is explained below):

vScheme AST K-normal form
LITERAL v
VAR x or @ or getglobal (you disambiguate in lab)
SET set or setglobal
IFX if
WHILE (while (let ...) ...)
APPLY @(...) or x(...) (you disambiguate in lab)
LETX let
   
VAL setglobal
DEFINE setglobal and funcode
   
LAMBDA funcode, provided all free variables are global variables

Using this correspondence, it should be easy to embed a K-normal into μScheme code—that means, change the representation of the code from K-normal form to μScheme. Projecting μScheme as a K-normal form, however, requires more information: to know what K-normal form corresponds to VAR, SET, or APPLY, each name and call in the source code has to be disambiguated. That will be done in lab. The embedding and projections themselves are a major part of your homework.

Terminology and notations for embedding and projection

Embedding and projection, like many other translations, can be expressed using recursion equations. Writing such a translation is challenging because there are three languages in play:

Both source and target languages are object languages; “object language” is the word for a language that is manipulated by a translator. In many translations, there is just one object language; the source language and target language are the same. Such translations are often called “optimizations.”

Having all these languages presents a notational problem: how to tell them apart. I use a notational trick commonly used in programming languages: to quote any term of an object language, I give it a colored background, and I wrap the term in “Oxford brackets”: ⟦...⟧ for a source-language term and ⟦...⟧ for a target-language term. For example, if I want to say how to translate an if expression from K-normal form into vScheme, I’ll quote the if expression as ⟦if x then e₁ else e₂⟧.5 The Oxford brackets are not that different from the double quote marks you are used to for writing string literals, and they can be used for both source and target languages.

The metalanguage is not supposed to be quoted within Oxford brackets. But on rare occasions we want to put metalanguage code inside the Oxford brackets. This exception requires some sort of notation to “escape” the act of being quoted. I use $(...), where the ... is code in the metalanguage. This notation, which is borrowed from the POSIX shell language, is a form of antiquotation. The existence of the antiquotation technically makes the Oxford brackets a “quasiquotation,” which means “quotation that can be escaped.”

Quasiquotation isn’t just for theory; it is supported by a number of programming languages. Lisp was the first; all Lisp dialects allow you to quasiquote S-expressions. Standard ML of New Jersey includes a somewhat rudimentary mechanism. But the really nice quasiquoter is the one for Haskell designed and implemented by Geoff Mainland. It has been used effectively in all sorts of language implementations.

Equations of embedding and projection

K-normal form is embedded into Scheme by a function 𝓔 (for “embed”) that satisfies the laws below. In each law, the left-hand side is metalanguage (with quoted object-language terms), and the right-hand side is an unholy mix of object language and metalanguage.6

K-normal form Embedding into Scheme
𝓔⟦v = … not written here …
𝓔⟦x = x
𝓔⟦if x then e₁ else e₂ = (if x 𝓔⟦e₁⟧ 𝓔⟦e₂⟧)
𝓔⟦let x = e in e’ = (let ([x 𝓔⟦e⟧]) 𝓔⟦e’⟧)
𝓔⟦(setglobal(x, X); x) = (set X x)
𝓔⟦(e₁; e₂) = (begin 𝓔⟦e₁⟧ 𝓔⟦e₂⟧)
𝓔⟦(x := e) = (set x 𝓔⟦e⟧)
𝓔⟦while x := e do e’ = (while (let ([x 𝓔⟦e⟧]) x) 𝓔⟦e’⟧)
𝓔⟦funcode (x₁, … xₙ) => e = (lambda (x₁ ⋯ xₙ) => 𝓔⟦e⟧)

The VM-operation form has some special cases, and it requires some antiquotation:

K-normal form Embedding into Scheme
𝓔⟦getglobal(v) = $(nameFrom v)
𝓔⟦setglobal(x, v) = (set $(nameFrom v) x)
𝓔⟦@(x₁, …, xₙ) = ($(Primitive.name @) x₁ ⋯ xₙ)
𝓔⟦@(x₁, …, xₙ, v) = ($(Primitive.name @) x₁ ⋯ xₙ 𝓔⟦v⟧)

Function nameFrom converts a string literal to a name; on other literals, it is not defined. The embedding of values, 𝓔⟦v, is not written down here; it’s what you’ll implement in function EmbedKN.value.

And to render embedded code easy to read, I recommend adding this special case:

𝓔⟦let x = e in x⟧ = 𝓔⟦e

The special case is useful because the code on the left is what your compiler will have to generate for a call to an unknown function.

Projection 𝓟 takes things in the other direction. It is very much a partial function; when given a Scheme form that doesn’t match any left-hand side, 𝓟 fails.

Scheme Projection into K-normal form
𝓟⟦v = … not written here …
𝓟⟦x = x, if x is local
𝓟⟦x = getglobal($(STRING x)), if x is global
𝓟⟦(if x e₁ e₂) = if x then 𝓟⟦e₁⟧ else 𝓟⟦e₂
𝓟⟦(let ([x e]) e’) = let x = 𝓟⟦e⟧ in 𝓟⟦e’
𝓟⟦(begin e₁ e₂) = (𝓟⟦e₁⟧; 𝓟⟦e₂⟧)
𝓟⟦(set x e) = (x := 𝓟⟦e⟧), if x is local
𝓟⟦(set x x’) = (setglobal(x’, $(STRING x)); x’), if x is global
𝓟⟦(x x₁ ⋯ xₙ) = @(x₁, …, xₙ),
  where x names primitive function @ and @ is not check or expect
𝓟⟦(x x₁ v) = @(x₁, v), where x names primitive check or expect
𝓟⟦(x x₁ ⋯ xₙ) = x(x₁, …, xₙ), otherwise
𝓟⟦(while (let ([x e]) x) e’) = while x := 𝓟⟦e⟧ do 𝓟⟦e’
𝓟⟦(lambda (x₁ ⋯ xₙ) => e) = funcode (x₁, … xₙ) => 𝓟⟦e,
    provided all the free variables of the lambda are global

On the left-hand sides, notice the two x forms, two set forms,7 two function-application forms. These are exactly the forms that are disambiguated in lab—so your projection will be able to tell them apart by pattern matching on the syntax.

Also note the side condition on lambda. A Scheme lambda projects to a K-normal form funcode only if neither of them has any free variables—or more precisely, all the free variables are global variables. Your implementation can skip this check for now.8

Semantics of funcode

The funcode form requires more explanation. From a semantic point of view, there are two ways to handle it: either it behaves like a classic Lisp lambda from 1958, or it behaves like a closure that captures only global variables. The latter form is what we want, and it is specified by the evaluation rule

\[\frac{}{\langle \mathtt{funcode}\; \mathit{xs}\; \mathtt{=>}\; e, \rho, \sigma \rangle \Downarrow \langle ⦇ \lambda \mathit{xs} . e, \rho_0 ⦈, \sigma \rangle}\text,\]

where \(\rho_0\) is the initial environment, i.e., the one mapping every global variable to its location in the global-variable table.


  1. With one exception, which is a performance cheat.

  2. In module 10 we will add a recursive let form.

  3. Function parameters are also considered local variables.

  4. If your browser is not rendering all the subscripts, it might help to install the Deja Vu fonts.

  5. The Oxford brackets aren’t pronounced. If we must call attention to the notation, we say that the if expression is “quoted.”

  6. Technically, a right-hand side should be notated by wrapping the whole thing in Oxford brackets and antiquoting the recursive call to 𝓔. But the result would be an unreadable mess, so nobody does it.

  7. The SETGLOBAL form is executed for both side effect and value, so it projects as a two-expression sequence: the setglobal primitive for side effect, and the register (from which the global is set) for value.

  8. In module 10, where we actually implement lambda, free variables are far more relevant—and you’ll implement a function that finds them. Until then, your projection function can take it on faith that a lambda has no free variables.