Introduction

Procedure calls and procedure calling conventions are among the deepest ideas in compiler construction. Since they implement recursion—a truly deep idea—the depth shouldn’t suprise you. Fortunately, the ideas themselves are simple. And because we control the SVM’s instruction set and procedure-calling convention, we will keep the implementation equally simple.

Recursive procedures work like this:

Every activation of a recursive procedure uses the same names for parameters and local variables.
And yet, in every activation, those names refer to different hardware locations.
The apparent inconsistency is resolved using a stack.

This idea is part of the curriculum in CS 40, and only one thing is different: in CS 40, the stack is a stack of memory locations. In the SVM, the stack is a stack of VM registers.

Location names and recursive functions: AMD64/SVM comparison

The best way to understand procedure calls, and to review your learning from CS 40, is to go deep on an example. We’ll use the naive, quadratic list-reversal function. In C code, it looks like this:

List reverse(List xs) {
  if (xs == NULL)
    return NULL;
  else
    return append(reverse(xs->cdr), list1(xs->car));
}

To understand how this recursive procedure is implemented, we’ll look at the AMD64 assembly code:¹

reverse:
    # parameter xs passed in register %rdi
    pushq   %rbp                   # save the base pointer
    movq    %rsp, %rbp             # old stack ptr is new base ptr
    pushq   %rbx                   # save nonvolatile register
    subq    $24, %rsp              # allocate activation (shift stack)
    movq    %rdi, -24(%rbp)        # save parameter xs at -24(%rbp)
    cmpq    $0, -24(%rbp)          # xs == NULL ?
    jne     .L2                    # if not, go to L2
    movl    $0, %eax               # put 0 in return register
    jmp     .L3                    # .L3 restores context and returns
.L2:
    ... omitted for simplicity ...
    call    list1
    ... omitted for simplicity ...
    movq    -24(%rbp), %rax        # set %rax to xs
    movq    8(%rax), %rax          # set %rax to xs->cdr
    movq    %rax, %rdi             # pass xs->cdr to reverse
    call    reverse
    # result arrives in register %rax
    movq    %rbx, %rsi
    movq    %rax, %rdi
    # actual parameters passed in registers %rdi and %rsi
    call    append                 # tail call, not optimized!
.L3:
    addq    $24, %rsp              # restore the stack frame
    popq    %rbx                   # restore saved registers
    popq    %rbp
    ret                            # return

This code is worth analyzing, but to make the analysis easier, let’s also show my SVM code for the same procedure. The source code looks like this:

(define reverse (xs)
  (if (null? xs)
      '()
      (append (reverse (cdr xs) (list1 (car xs))))))

The SVM assembly code looks like this:

r0 := function (1 arguments) {
  ;; argument `xs` arrives in register `r1`
  r2 := null? r1
  if r2 goto L1   
    ;; if xs is not empty
    r2 := global append
    r3 := global reverse
    r4 := cdr r1
    r3 := call r3 (r4)    ;; r3 := (reverse (cdr xs))
    r4 := global list1
    r5 := car r1
    r4 := call r4 (r5)    ;; r4 := (list1 (car xs))
    tailcall r2 (r3, r4)  ;; returns (append r3 r4)
  L1:
    ;; if xs is empty
    r0 := '()
    return r0
}
global reverse := r0

Comparing these two assembly-language sequences can tell us a lot about what is going on. Don’t try to digest all the details, but look over this comparison table, and remember what’s discussed here—then when it’s time to implement call, return, and tailcall, use it for reference.

AMD64	SVM
Internally, the location of parameter `xs` has the same name in every activation (`-24(%rbp)`).	Internally, the location of parameter `xs` has the same name in every activation (`r1`).
The meaning of the name `%rbp` is changed at the beginning of the procedure (by the `movq` instruction on line 4).	The meaning of the name `r1` is changed before the procedure starts to run (by the `call` instruction that activates the procedure).
The previous meaning of `%rbp` is restored before the procedure returns (by the `popq` instruction on line 28).	The previous meaning of `r1` is restored as the procedure returns (by the `return` instruction on line 17, or by the `return` instruction in `append`, which is tail called on line 13).
The `call` instruction saves the return address on the call stack.	The `call` instruction saves the return address on the call stack.
Incoming parameters arrive in conventional registers (`%rdi`, `%rsi`, and so on).	Incoming parameters arrive in conventional registers (`r1`, `r2`, and so on).
Calls to functions `list1`, `reverse`, and `append` pass parameters in the same conventional registers at each call (`%rdi`, `%rsi`, and so on).	Calls to functions `list1`, `reverse`, and `append` pass parameters in different registers at each call: `reverse`’s parameter is passed in `r4`, `list1`’s parameter is passed in `r5`, and `append`’s parameters are passed in `r3` and `r4`.
By AMD64 convention, the result from `reverse` must come back in register `%rax`, and to get it where it needs to go to append, the value has to be moved into `%rdi` (lines 20 and 22).	The SVM can take a result into any register, so the `call` to reverse puts the result into `r3` (line 9), and when it’s time to call `append` (line 13), the value is already where it needs to be—it doesn’t have to be moved.
On the AMD64, the call to `append` produces its result in `%eax`, which is where `reverse` is also expected to produce its own result. So between lines 24 and 29, no work has to be done to get the result in the right place. This is an example of a small optimization.	On the SVM, more optimization is possible. By making a `tailcall` to `append` on line 13, the assembly code not only ensures that the result of `reverse` is the result of `append`, it also instructs `append` to reuse its register window and activation record. The “push then pop” computation executed on the AMD64 is avoided here; that makes `tailcall` an optimized tail call. Given a suitable calling convention, any tail call can be optimized. Even when, as here, the call is not recursive.

Location names and register windows

The big new idea in the SVM involves the names of locations. The C compiler uses locations on the stack, like -24(%rbp). We know that this stands for “the location 24 bytes below the base pointer,” but if we abstract a little bit, this phrase is just a name for the location. Every activation of reverse uses this same name, but in every activation, the name stands for a different location in hardware memory. In the SVM, we’re going to achieve exactly the same effect, but with one difference: we use register names. Using a single register name to mean a different location in each activation is made possible by hardware called register windows.

The idea of register windows is that the SVM has thousands of registers, but at any one time, code can only refer to a contiguous sequence of 256 registers. That sequence is the register window.

At a typical procedure call, the register window shifts toward higher-numbered registers. Here’s a picture of the call to list1 in function reverse:

In the figure, here’s what’s going on:

The row of boxes outlined in black are the actual registers.
The names shown in red, above the registers, are the names used in function reverse. The register window begins with the red r0.
The names shown in blue, below the registers, are the names used in function list1. The register window begins with the blue r0.

At the point where list1 is called, let’s look at reverse and see how each register is used.

According to the rules of our calling convention, register r0 holds the function being called (reverse) and r1 holds the first parameter (xs).
The code was in the process of setting up a call to append, so append is in register r2 and its first parameter is in r3.
The code is about to call list1, so the function about to be called is in register r4, and its actual parameter is in r5.

The next instruction to be executed is going to be

  r4 := call r4 (r5)

How can this work? Doesn’t the calling convention say the function goes in r0 and the first parameter in r1? Yes! When the VM sees the call instruction, it shifts the register window. The red r4 becomes the blue r0, the red r5 becomes the blue r1, and so on.

Imagine a piece of cardboard with a rectangular cutout 256 slots wide. That’s the window. At the call, the window slides 4 slots to the right. One beauty of this approach is that reverse can keep its private information in red registers r0 to r3, and nothing bad can ever happen to them, because they are inaccessible. Function list1 does not even have names for those registers.

In your VM, the easy way to implement register windows is to keep a pointer to register 0. A reference to a numbered register is implemented by an indexing operation from that pointer.

Hardware register windows: History and commentary

At the dawn of the microprocessor era in the 1970s, the circuits needed to build fast registers were so expensive that many processors didn’t have them. Early microprocessors had only two real registers (a program counter and a stack pointer). The Intel x86 architecture had a full eight general-purpose registers. Wow!

But from the poor compiler’s point of view, 8 registers aren’t a lot. The compiler has to figure out how to manage all of its local variables and temporary data using only 8 registers. This problem is called register allocation, and is one of the hardest problems in compilation. Many compiler courses, like this one, punt on register allocation—perhaps outsourcing it to LLVM.

By the middle 1980s, hardware had advanced enough that new instruction-set designs, like ARM and MIPS, supported 16 or even 32 general-purpose registers. (That’s why AMD64 is so much better than x86—it has twice as many registers.) And hardware kept getting better, and suddenly we had another problem: we started running out of space for register names. With 32 registers, an ADD instruction with three registers might spend 15 bits of the instruction encoding just to encode register names. It started to look like we couldn’t have more registers just because we couldn’t find bits in the instruction word. And then engineers at Berkeley and Sun had a brilliant idea: suppose we build hardware with more registers than we can actually name at one time? And we could change the names at will, so in different parts of the program, we could use the same names for different registers. That is the idea of register windows.

The register-windowed architecture with the largest commercial footprint was undoubtedly the SPARC. It provided 8 “global” registers along with a 24-register window that could be shifted in multiples of 16. This architecture made it pretty easy to build in, say, 72 or 80 hardware registers even though the instruction word could name only 32 registers.

Hardware register windows looked like a cool idea, but only for a brief era: hardware had enough transistors that it could support extra registers, but not so many that the hardware could basically remap instruction references into big, internal register numbers (which is the case today). That era lasted about 15 years: the amount of time that elapsed between the announcement of the SPARC architecture and the announcement of the AMD64 architecture. AMD64 updated the Intel platform from 8 registers to 16 registers. 16 registers are just good enough for most purposes, and Intel’s fabrication prowess, combined with the dominance of Microsoft Windows, eventually rendered the architectural insights of the 1980s and 1990s commercially irrelevant. Desktop and server applications are stuck with AMD64: a classic Worse is Better situation.

But for a virtual machine, register windows are still a superb choice.

Implementing functions and calls

Sliding the register window is only part of implementing a function call. To understand the rest, let’s start with what the VM needs to know about a function:

What sequence of instructions it wants to execute
The number of registers it uses
How many parameters it is expecting

This information is part of the representation of a struct VMFunction defined in header file value.h. The information is used as follows:

The instruction sequence is what vmrun starts executing when a function is called. It can be stored in the VM’s state data structure or just cached in a local variable of vmrun.
The number of registers is used to make sure that the VM’s representation of the complete register file has already allocated a slot for each register that might be used. That is, the VM ensures that every register that might be used is mapped.
The number of parameters the function expects² is compared with the number of actual parameters at the call site. If they match, all is well. otherwise, the VM has three options:
- It can implement a \(\mu\)Scheme-style semantics in which a mismatch in the number of parameters causes a run-time error. This is the semantics of vScheme and is the default behavior.
- It can implement a semantics in which the actual parameters are “adjusted” to fit the function: extra actual parameters are discarded, and missing actual parameters are initialized to nil. This is the semantics of Lua; it is available to implement for depth points.
- It can implement a partial-application semantics in which functions are automatically curried:
  - If there are extra actual parameters, the VM calls the function with only as many parameters as it is expecting, then applies the result to the extra parameters.
  - If there are missing actual parameters, the VM creates a closure that expects to receive the missing parameters, then activates the original function.
  This is the semantics of Haskell and OCaml; it is available to implement for depth points.

Once a function’s information is known, the function can be called. And when its code starts to run, the function object itself is in register r0, and its actual parameters (1 to \(n\)) are in registers r1 to r\(n\). Higher-numbered registers (registers \(n+1\) to 255) contain unspecified values and are available for any purpose.

The call stack

A hardware implementation of procedure calls has to save the calling context in registers and memory. But in our SVM, we introduce additional state just for managing calls. (Real hardware has such state as well, but it is buried in caches and optimizations; it’s not visible in the instruction-set architecture.) That state is a stack, and for each activation it records the following information:

Where the register window is
Where the suspended execution is slated to resume
What register in the caller is expecting to receive the result (the destination register)

This information is stored in a record of type struct Activation, which you will define in header file vmstack.h. In the operational semantics, the record is written as (R, I₁•I₂, r), where R represents the register window, I₁•I₂ represents the instruction stream and program counter, and r represents the destination register. These records are organized into a finite array, which the VM manages as a stack.

The three instructions

The call stack is manipulated by three instructions: call, return, and tail call.

To call a function f, the caller places f’s value in register \(r[k]\), and it places N actual parameters in registers \(r[k+1]\) through \(r[k+N]\). It then executes a call instruction with the \(X\) field set to \(x\) (naming a result register), the Y field set to \(k\), and the Z field set to \(k+N\). The call is assumed to kill registers \(r[k]\) and higher—that is, all the registers in the callee’s window. When the call terminates, the returned value is in register \(r[x]\).

To implement the call, the VM pushes a struct Activation on the call stack, slides the register window, and activates the new function.

An active function f terminates in one of two ways:

It returns a value. There must be an activation on the stack, which tells where the return value is expected. The register window slides; the returned value gets placed where the caller requested it; the instruction stream and program counter are restored to their values before the call, and the struct Activation is popped off the call stack.
It makes a tail call to another function g. This looks like an ordinary call with these changes:
- The register window doesn’t move. Instead, g is copied into register r0, and the actual parameters (if any) are copied into r1, r2, and so on.
- Nothing happens to the call stack.

Once you’ve implemented a register window, a call stack, and the three instructions, you’ve implemented procedure calls.

If you’re rusty, I found a quick refresher at MIT.↩
This is the number of formal parameters in the function’s definition.↩