Here we start our long journey from the machine level to the high level. “We can so have nice things!”
The nice thing we’d ultimately like to have is a typical scripting language: one that’s dynamically typed and that offers both functional and imperative features. From there we can consider adding fancy bits, like objects or a type system.
Ideas of K-normal form
Our first step is called K-normal form. Its key ideas are as follows:
K-normal form is a language of expressions that’s going to look like a subset of Scheme: a typical expression is evaluated for a value, but some expressions may also be evaluated for a side effect.
K-normal form includes two forms of function call: an ordinary form, which compiles to a
call
instruction (to be implemented in module 7), and a special primitive form, which compiles to whatever VM instruction goes with the primitive.In K-normal form, expressions and calls are not nested. VIEWED AS CONFUSING; FIX ME A function (ordinary or primitive) can be called only if it is named, and all the arguments of a call are named.1
Names are different. In K-normal form, a name always refers to a local variable bound with
let
, or to a parameter. A global vScheme variable is handled by writing the variable’s name as a literal value, which is then passed as a parameter to primitivesetglobal
orgetglobal
. In the SVM loader, each value is converted into an index, which refers to a slot on the VM’s global-variable table.K-normal form is shockingly close to machine code. Just replace each name with a machine register, and it’s almost assembly code.
There are no definition forms! A definition is coded as an assignment to a global variable.
Why is K-normal form a language of expressions, not statements or instructions? Because our compiler (UFT) is going to translate between programs using the same “substitute equals for equals” techniques we used to do calculational proofs in CS 105. These techniques enable us to avoid all the fancy compiler stuff that goes under the heading of “dataflow analysis.”
The syntactic structure of K-normal form
The syntax of K-normal form refers to the usual names, expressions, and literal values, but it also refers to virtual-machine operators. A virtual-machine operator, written using metavariable @
, is a name that looks like a function, but an application of a virtual-machine operator does not compile to a function call; it compiles to a single instruction on the SVM.
e |
Expression |
x |
Name |
v |
Literal (number, string, Boolean, and so on) |
@ |
Virtual-machine operator (might be + or cons , for example) |
K-normal form is a language of expressions—effectively a subset of vScheme—which satisfies some invariants and restrictions that keep the syntactic structure simple enough to generate code from without requiring a ton of semantic analysis:
Every argument to every function call and VM operator has a name—except in one special case, where one argument may be a literal value.
The condition in an
if
expression orwhile
loop has a name.There is only one form of
let
, and it binds exactly one variable.2The
set
and variable forms refer only to local variables.3 Global variables are read and written by VM operatorsgetglobal
andsetglobal
.The
begin
form is restricted to exactly two expressions.The
funcode
form looks like alambda
, but it isn’t: afuncode
may not have any free names! (That is, all its free variables are global variables.)
The syntactic structure of K-normal form is so central that I present it in two notations:
- An ML-like notation, which is easy to read, write, and understand
- A Scheme-like notation, which is easy to relate to Scheme
The ML-like notation looks like this:4
e ::= v -- literal value
| x -- name
| @(x₁, ..., xₙ) -- VM operation
| @(x₁, ..., xₙ, v) -- VM operation with one literal
| x(x₁, ..., xₙ) -- function call
| if x then e₁ else e₂
| let x = e in e'
| (e₁; e₂) -- sequence
| x := e -- assignment to local variable
| while x := e do e' -- loop with named condition
| funcode (x₁, ... xₙ) => e -- can't have free names
The “VM operation with one literal” form is going to be useful for VM instructions that use the R1LIT
parser. Depending on how you’ve designed your SVM, those instructions might include check
, expect
, and maybe also setglobal
and getglobal
.
Examples of the ML-like notation might look like this:
Source code:
if n < 0 then negated n else n
K-normal form:
let t1 = 0 in let t2 = <(n, t1) in if t2 then let t3 = getglobal("negated") in t3(n) else n
Source code:
x :: revAppend (xs, ys)
K-normal form:
let t1 = getglobal("revAppend") in let t2 = t1(xs, ys) in cons(x, t2)
The Scheme-like notation looks like this:
e ::= v -- literal value
| x -- name
| (@ x₁ ... xₙ) -- VM operation
| (@ x₁ ... xₙ v) -- VM operation with one literal
| (x x₁ ... xₙ) -- function call
| (if x e₁ e₂)
| (let ([x e]) e')
| (begin e₁ e₂)
| (set x e)
| (while (let ([x e]) x) e') -- loop with named condition
| (funcode (x₁ ... xₙ) e) -- can't have free names
To make the while
form truly a subset of Scheme, I have to duplicate the x
. But like the ML notation while x := e do e'
, your internal representation should have just one copy of x
.
Examples of the Scheme-like notation might look like this:
Scheme source code: (if (< n 0) (negated n) n)
K-normal form, assuming n is a local variable:
(let ([t1 0]) (let ([t2 (< n t1)]) (if t2 (let ([t3 (getglobal 'negated)]) (t3 n)) n)))
Equivalent vScheme:
The concrete syntax above looks like valid vScheme, but it isn’t: vScheme doesn’t have a getglobal operation. In vScheme, the way to refer to a global variable is just to use its name. It is also conventional to write nested
let
expressions using an equivalentlet*
. So the example shown above could be projected from a vScheme expression like this:(let* ([t1 0] [t2 (< n t1)]) (if t2 (let ([t3 negated]) (t3 n)) n))
To test this expression, it would have to be wrapped in code that binds
n
locally, like (let ([n ’uninitialized]) …).Scheme source code: (cons x (append xs ys))
K-normal form, allowing let*, and assuming x, xs, and ys are local variables
(let* ([t1 (getglobal 'append)] [t2 (t1 xs ys)]) (cons x t2))
Equivalent vScheme:
(let* ([t1 append] [t2 (t1 xs ys)]) (cons x t2))
Relationship between vScheme and K-normal form
K-normal form corresponds to a first-order subset of vScheme. Here’s a rough correspondence (the meaning of the colors is explained below):
vScheme AST | K-normal form |
---|---|
v |
|
x or @ or getglobal (you disambiguate in lab) |
|
set or setglobal |
|
if |
|
(while (let ...) ...) |
|
@(...) or x(...) (you disambiguate in lab) |
|
let |
|
setglobal |
|
setglobal and funcode |
|
funcode , provided all free variables are global variables |
Using this correspondence, it should be easy to embed a K-normal into μScheme code—that means, change the representation of the code from K-normal form to μScheme. Projecting μScheme as a K-normal form, however, requires more information: to know what K-normal form corresponds to VAR
, SET
, or APPLY
, each name and call in the source code has to be disambiguated. That will be done in lab. The embedding and projections themselves are a major part of your homework.
Terminology and notations for embedding and projection
Embedding and projection, like many other translations, can be expressed using recursion equations. Writing such a translation is challenging because there are three languages in play:
The language in which the translator itself is implemented is the
. For us, the metalanguage is likely to be Standard ML. And when a translation is specified in a book, a paper, or a handout, the metalanguage is typically mathematics.The language being translated from is the source language.
The language being translated to is the target language.
Both source and target languages are object languages; “object language” is the word for a language that is manipulated by a translator. In many translations, there is just one object language; the source language and target language are the same. Such translations are often called “optimizations.”
Having all these languages presents a notational problem: how to tell them apart. I use a notational trick commonly used in programming languages: to quote any term of an object language, I give it a colored background, and I wrap the term in “Oxford brackets”: ⟦...⟧
for a source-language term and ⟦...⟧
for a target-language term. For example, if I want to say how to translate an if
expression from K-normal form into vScheme, I’ll quote the if
expression as ⟦if x then e₁ else e₂⟧
.5 The Oxford brackets are not that different from the double quote marks you are used to for writing string literals, and they can be used for both source and target languages.
The metalanguage is not supposed to be quoted within Oxford brackets. But on rare occasions we want to put metalanguage code inside the Oxford brackets. This exception requires some sort of notation to “escape” the act of being quoted. I use $(, where the ) is code in the metalanguage. This notation, which is borrowed from the POSIX shell language, is a form of antiquotation. The existence of the antiquotation technically makes the Oxford brackets a “quasiquotation,” which means “quotation that can be escaped.”
Quasiquotation isn’t just for theory; it is supported by a number of programming languages. Lisp was the first; all Lisp dialects allow you to quasiquote S-expressions. Standard ML of New Jersey includes a somewhat rudimentary mechanism. But the really nice quasiquoter is the one for Haskell designed and implemented by Geoff Mainland. It has been used effectively in all sorts of language implementations.
Equations of embedding and projection
K-normal form is embedded into Scheme by a function 𝓔 (for “embed”) that satisfies the laws below. In each law, the left-hand side is metalanguage (with quoted object-language terms), and the right-hand side is an unholy mix of object language and metalanguage.6
K-normal form | = Embedding into Scheme |
---|---|
𝓔⟦v⟧ | = … not written here … |
𝓔⟦x⟧ | = x |
𝓔⟦if x then e₁ else e₂⟧ | = (if x 𝓔⟦e₁⟧ 𝓔⟦e₂⟧) |
𝓔⟦let x = e in e’⟧ | = (let ([x 𝓔⟦e⟧]) 𝓔⟦e’⟧) |
𝓔⟦(setglobal(x, X); x)⟧ | = (set X x) |
𝓔⟦(e₁; e₂)⟧ | = (begin 𝓔⟦e₁⟧ 𝓔⟦e₂⟧) |
𝓔⟦(x := e)⟧ | = (set x 𝓔⟦e⟧) |
𝓔⟦while x := e do e’⟧ | = (while (let ([x 𝓔⟦e⟧]) x) 𝓔⟦e’⟧) |
𝓔⟦funcode (x₁, … xₙ) => e⟧ | = (lambda (x₁ ⋯ xₙ) => 𝓔⟦e⟧) |
The VM-operation form has some special cases, and it requires some antiquotation:
K-normal form | = Embedding into Scheme |
---|---|
𝓔⟦getglobal(v)⟧ | = $( ) |
𝓔⟦setglobal(x, v)⟧ | = (set $( ) x) |
𝓔⟦@(x₁, …, xₙ)⟧ | = ($( ) x₁ ⋯ xₙ) |
𝓔⟦@(x₁, …, xₙ, v)⟧ | = ($(v⟧) ) x₁ ⋯ xₙ 𝓔⟦ |
Function 𝓔⟦v⟧, is not written down here; it’s what you’ll implement in function .
converts a string literal to a name; on other literals, it is not defined. The embedding of values,And to render embedded code easy to read, I recommend adding this special case:
𝓔⟦let x = e in x⟧ = 𝓔⟦e⟧
The special case is useful because the code on the left is what your compiler will have to generate for a call to an unknown function.
Projection 𝓟 takes things in the other direction. It is very much a partial function; when given a Scheme form that doesn’t match any left-hand side, 𝓟 fails.
Scheme | = Projection into K-normal form |
---|---|
𝓟⟦v⟧ | = … not written here … |
𝓟⟦x⟧ | = x, if x is local |
𝓟⟦x⟧ | = getglobal($(, if ))x is global |
𝓟⟦(if x e₁ e₂)⟧ | = if x then 𝓟⟦e₁⟧ else 𝓟⟦e₂⟧ |
𝓟⟦(let ([x e]) e’)⟧ | = let x = 𝓟⟦e⟧ in 𝓟⟦e’⟧ |
𝓟⟦(begin e₁ e₂)⟧ | = (𝓟⟦e₁⟧; 𝓟⟦e₂⟧) |
𝓟⟦(set x e)⟧ | = (x := 𝓟⟦e⟧), if x is local |
𝓟⟦(set x x’)⟧ | = (setglobal(x’, $(, if )); x’)x is global |
𝓟⟦(x x₁ ⋯ xₙ)⟧ | = @(x₁, …, xₙ), |
where x names primitive function @ and @ is not check or expect | |
𝓟⟦(x x₁ v)⟧ | = @(x₁, v), where x names primitive check or expect |
𝓟⟦(x x₁ ⋯ xₙ)⟧ | = x(x₁, …, xₙ), otherwise |
𝓟⟦(while (let ([x e]) x) e’)⟧ | = while x := 𝓟⟦e⟧ do 𝓟⟦e’⟧ |
𝓟⟦(lambda (x₁ ⋯ xₙ) => e)⟧ | = funcode (x₁, … xₙ) => 𝓟⟦e⟧, |
provided all the free variables of the lambda are global |
On the left-hand sides, notice the two x
forms, two set
forms,7 two function-application forms. These are exactly the forms that are disambiguated in lab—so your projection will be able to tell them apart by pattern matching on the syntax.
Also note the side condition on lambda. A Scheme lambda projects to a K-normal form funcode only if neither of them has any free variables—or more precisely, all the free variables are global variables. Your implementation can skip this check for now.8
Semantics of funcode
The funcode form requires more explanation. From a semantic point of view, there are two ways to handle it: either it behaves like a classic Lisp lambda
from 1958, or it behaves like a closure that captures only global variables. The latter form is what we want, and it is specified by the evaluation rule
\[\frac{}{\langle \mathtt{funcode}\; \mathit{xs}\; \mathtt{=>}\; e, \rho, \sigma \rangle \Downarrow \langle ⦇ \lambda \mathit{xs} . e, \rho_0 ⦈, \sigma \rangle}\text,\]
where \(\rho_0\) is the initial environment, i.e., the one mapping every global variable to its location in the global-variable table.