K-Normalization and Register Allocation

K-Normalization is the middle of the three really fun transformations in our compiler. (The first was code generation; the last will be closure conversion.) It gets us from pure, first-order Scheme to K‑normal form.

Introduction

To introduce the transformation, let’s look at an example expression:

x + 1

This expression is not in K‑normal form because the operands of the + operation aren’t both in registers. After K‑normalization, this expression becomes

let t = 1 in x + t

where t is a temporary register.

Here’s a slightly more general expression:

x + e

It’s tempting to recursively K‑normalize e to e’, pull a temporary register t, and emit

let t = e' in x + t

But this approach has a problem: it K‑normalizes x + y to

let t = y in x + t,

which introduces an unnecessary move instruction (and also burns a register to no purpose).

To fix this issue, we’ll define a helper function that builds a let expression only when needed. We don’t care what t is, so we’ll pass the two expressions e’ and x + [●], where [●] stands for a “hole” that the helper function will fill with t. In our example,

let t = e' in x + t

is produced by metalanguage code that looks something like this:

smartLet e’ ⟦x + [●]⟧         (* artist’s conception *)

A value of type “expression with hole” is represented in the metalanguage by a function of type reg -> reg exp, and the hole is filled by applying the function to whatever register goes in the hole. The type of the helper doesn’t look much like the type of a let form, so instead of smartLet, We’ll call the helper function bindAnyReg: the helper function, not the caller, picks t. My first draft looks something like this:

bindAnyReg e’ (fn t => ⟦x + $(t))       (* early rough mix *)

An example

A Scheme primitive call (@ e₁ e₂) will bind both argument expressions to registers:1

normalize (@ e₁ e₂)
  = { ask bindAnyReg to place each argument in a register }
bindAnyReg (normalize e₁) (fn t =>
  bindAnyReg (normalize e₂) (fn u =>
    @(t, u)))

To illustrate this transformation, I K-normalize the expression (+ 2 3) (an expression that we might prefer to evaluate at compile time):

normalize (+ 2 3)
  = { place each argument in a register }
bindAnyReg (normalize 2) (fn t =>
  bindAnyReg (normalize 3) (fn u =>
    +(t, u)))

The literal 2 is K-normalized as 2:

bindAnyReg (normalize 2) (fn t =>
  bindAnyReg (normalize 3) (fn u =>
    +(t, u)))
  = { normalize the source literal 2 making a target literal 2 }
bindAnyReg 2 (fn t =>
  bindAnyReg (normalize 3) (fn u =>
    +(t, u)))

Now bindAnyReg allocates available register t to hold the value of target expression 2. It then returns a let binding whose body is obtained by applying bindAnyReg’s continuation to t:

bindAnyReg 2 (fn t =>
  bindAnyReg (normalize 3) (fn u =>
    +(t, u)))
  = { let-bind 2 to available register t }
let t = 2 in $((fn t =>
  bindAnyReg (normalize 3) (fn u =>
    +(t, u))) t)

  = { function application (β-reduction) }
let t = 2 in
  $(bindAnyReg (normalize 3) (fn u =>
    +(t, u)))

Now the same thing happens with literal 3: it gets K-normalized as 3, and bindAnyReg allocates another available register. Since t is now in use, bindAnyReg allocates u. And it returns a let binding whose body is obtained by applying bindAnyReg’s continuation to u:

let t = 2 in
  $(bindAnyReg (normalize 3) (fn u =>
    +(t, u)))

  = { K-normalize 3 as 3 }
let t = 2 in
  $(bindAnyReg 3 (fn u =>
    +(t, u)))

  = { let-bind 3 to available register u }
let t = 2 in
  let u = 3 in ((fn u =>
    +(2, u)) u)

  = { function application (β-reduction) }
let t = 2 in
  let u = 3 in
    +(t, u)

The final target expression let t = 2 in let u = 3 in +(t, u) is in K-normal form.

Another example: avoiding unnecessary moves

Suppose I K-normalize the expression (+ x 1). I don’t want to allocate a register to hold x, because as long as x is a local variable, it is already in a register. In this special case, bindAnyReg can simply apply its continuation to x, without emitting a let binding:

normalize (+ x 1)
  = { place each argument in a register }
bindAnyReg (normalize x) (fn t =>
  bindAnyReg (normalize 1) (fn u =>
    +(t, u)))
  = { normalize x as x }
bindAnyReg x (fn t =>
  bindAnyReg (normalize 1) (fn u =>
    +(t, u)))
  = { x is already in a register; pass it to the continuation }
(fn t =>
  bindAnyReg (normalize 1) (fn u =>
    +(t, u))) x)
  = { function application (β-reduction) }
bindAnyReg (normalize 1) (fn u =>
  +(x, u))
  = { normalize 1 as 1 }
bindAnyReg 1 (fn u =>
  +(x, u))
  = { bind 1 to register u; pass u to continuation }
let u = 1 in
  $(((fn u =>
    +(x, u))) u)

  = { function application (β-reduction) }
let u = 1 in
  +(x, u)

Expression (+ x 1) is K-normalized as let u = 1 in +(x, u) without burning an extra register.

Background: Managing registers

Saying “bindAnyReg can allocate a new register” is all very well, but the allocation of registers is the crux of the transformation. An essential property of K‑normal form is that if each name corresponds to a machine register, the code can be translated directly into assembly language without needing any additional registers. K‑normalization is mostly a register-allocation problem. And register allocation is one place where the simple virtual-machine approach diverges from a classic compiler:

One other thing: what happens if you run out of machine registers?

Registers and procedure calls

Lots of times the compiler just needs “any register,” but sometimes it needs a particular register. For example, if a compiler needs to divide integers on the x86, there is only one register that supports integer division, and the divisor has to go in that register.

Intel hardware is just annoying, but there is another example that applies to pretty much any hardware: procedure calls. When registers are used to pass information from one procedure to another, the choice of which values go where is dictated by a procedure calling convention. That convention usually calls for particular registers. Our convention is particularly simple: when a function is called, the function value itself arrives in register 0, and the actual parameters are in registers 1, 2, 3, and so on.

That’s from the callee’s perspective; from the caller’s perspective, thanks to register windows, we can pick any contiguous sequence of registers—provided we don’t mind losing the values in all the higher-numbered registers. To implement a procedure call from the caller’s side, we’ll identify a register \(k\) such that all registers with numbers \({}\ge k\) are available. To call a procedure with \(n\) parameters, we’ll use registers \(k\) through \(k+n\) (inclusive) to hold the procedure and its actual parameters. And the callee will have the right to destroy every single one of them—as well as all higher-numbered registers from \(k+n+1\) onward.

Whether it’s to implement calling conventions or just to deal with finicky hardware, putting a value into a specific, conventional register is sometimes called targeting. If there’s a coalescing register allocator in the picture, targeting is no problem: the compiler just allocates a fresh temporary for every actual parameter, then emits an instruction that moves the temporary into the conventional register. When possible, the coalescing register allocator eliminates the move.

Our approach to register allocation

In a virtual machine, it’s still possible to “allocate temporaries like crazy.” But this strategy requires a separate pass to assign a machine register to each temporary. Instead, I’m advocating a one-pass approach that places both local variables and intermediate values into machine registers directly, during K‑normalization. This approach has the potential to waste lots of machine registers, but it is simple, fast, and relatively easy to implement. And by design, we have registers to burn.

Here are the elements of the approach:

Although (in my opinion) bindAnyReg is best written in continuation-passing style, most of a K‑normalizer can be written nicely as a direct-style recursive function. That function’s main jobs are to manage the set of available registers and to keep track of the correct environments. Detailed advice can be found in the step-by-step instructions for the module.

Two strategies for implementation, or, why compiled implementations are fast

Now it’s worth taking a step back and thinking about how we are implementing the semantics and what we’re getting out of the virtual machine. For review, the simplest implementation of a language and its semantics is usually a definitional interpreter. This is the sort of interpreter you may have encountered in a course like CS 105, where a semantics is implemented directly as a more-or-less functional program. A definitional interpreter has these elements:

When we transition to a virtual machine, our notions of values and stores change very little. But our notions of syntax and environment change radically:

How does the environment work? How does the translation work? To answer these questions, we have to think more about σ. Ignoring heap objects, in the definitional interpreter, σ just refers to slots in the environment. In the virtual machine, σ refers to machine registers. And in both the definitional interpreter and in the virtual machine, ρ does not change during program execution. That is, once a variable’s location is chosen, that variable doesn’t move.

With this property in mind, let’s revisit the operational-semantics rule for evaluating a variable:

\[\frac{x \in \mathrm{dom}\; \rho \qquad \ell = \rho(x)} {\langle \mathtt{VAR}(x),ρ,σ\rangle \Downarrow ⟨σ(ℓ),σ⟩}\]

There are two lookups here: first look \(x\) up in ρ to get location \(ℓ\), then fetch the contents of \(ℓ\). In the definitional interpreter, both lookups are done at run time. In the virtual machine, the compiler does the first lookup at compile time, gets location \(ℓ\), and drops \(ℓ\) into the assembly code: \(ℓ\) is represented by a register number. The only thing that happens at run time is fetching a value out of location \(ℓ\). This operation is not only constant time; it’s fast. (The whole purpose of registers is to be fast.)

This development answers the question, “what are we getting out of the virtual machine?” First and foremost, we are getting speed, and we are getting it by replacing an expensive data-structure lookup with a fast load from a register. This transformation provides the single biggest performance improvement available with a virtual machine, and it’s enabled by K‑normalization.


  1. The code shown in the examples doesn’t show all the details. For example, it doesn’t show how to keep track of the available registers. Use the examples as an illustration of the key ideas, not as a template for your own code.

  2. Or you could just pretend to build a compiler and leave the real work to LLVM.

  3. If you’ve studied computational complexity, optimal register use is NP-hard.