Understanding Procedure Calls

Introduction

Procedure calls and procedure calling conventions are among the deepest ideas in compiler construction. Since they implement recursion—a truly deep idea—the depth shouldn’t suprise you. Fortunately, the ideas themselves are simple. And because we control the SVM’s instruction set and procedure-calling convention, we will keep the implementation equally simple.

Recursive procedures work like this:

This idea is part of the curriculum in CS 40, and only one thing is different: in CS 40, the stack is a stack of memory locations. In the SVM, the stack is a stack of VM registers.

Location names and recursive functions: AMD64/SVM comparison

The best way to understand procedure calls, and to review your learning from CS 40, is to go deep on an example. We’ll use the naive, quadratic list-reversal function. In C code, it looks like this:

List reverse(List xs) {
  if (xs == NULL)
    return NULL;
  else
    return append(reverse(xs->cdr), list1(xs->car));
}

To understand how this recursive procedure is implemented, we’ll look at the AMD64 assembly code:1

This code is worth analyzing, but to make the analysis easier, let’s also show my SVM code for the same procedure. The source code looks like this:

(define reverse (xs)
  (if (null? xs)
      '()
      (append (reverse (cdr xs) (list1 (car xs))))))

The SVM assembly code looks like this:

Comparing these two assembly-language sequences can tell us a lot about what is going on. Don’t try to digest all the details, but look over this comparison table, and remember what’s discussed here—then when it’s time to implement call, return, and tailcall, use it for reference.

AMD64 SVM
Internally, the location of parameter xs has the same name in every activation (-24(%rbp)). Internally, the location of parameter xs has the same name in every activation (r1).
The meaning of the name %rbp is changed at the beginning of the procedure (by the movq instruction on line 4). The meaning of the name r1 is changed before the procedure starts to run (by the call instruction that activates the procedure).
The previous meaning of %rbp is restored before the procedure returns (by the popq instruction on line 28). The previous meaning of r1 is restored as the procedure returns (by the return instruction on line 17, or by the return instruction in append, which is tail called on line 13).
The call instruction saves the return address on the call stack. The call instruction saves the return address on the call stack.
Incoming parameters arrive in conventional registers (%rdi, %rsi, and so on). Incoming parameters arrive in conventional registers (r1, r2, and so on).
Calls to functions list1, reverse, and append pass parameters in the same conventional registers at each call (%rdi, %rsi, and so on). Calls to functions list1, reverse, and append pass parameters in different registers at each call: reverse’s parameter is passed in r4, list1’s parameter is passed in r5, and append’s parameters are passed in r3 and r4.
By AMD64 convention, the result from reverse must come back in register %rax, and to get it where it needs to go to append, the value has to be moved into %rdi (lines 20 and 22). The SVM can take a result into any register, so the call to reverse puts the result into r3 (line 9), and when it’s time to call append (line 13), the value is already where it needs to be—it doesn’t have to be moved.
On the AMD64, the call to append produces its result in %eax, which is where reverse is also expected to produce its own result. So between lines 24 and 29, no work has to be done to get the result in the right place. This is an example of a small optimization. On the SVM, more optimization is possible. By making a tailcall to append on line 13, the assembly code not only ensures that the result of reverse is the result of append, it also instructs append to reuse its register window and activation record. The “push then pop” computation executed on the AMD64 is avoided here; that makes tailcall an optimized tail call.
 
Given a suitable calling convention, any tail call can be optimized. Even when, as here, the call is not recursive.

Location names and register windows

The big new idea in the SVM involves the names of locations. The C compiler uses locations on the stack, like -24(%rbp). We know that this stands for “the location 24 bytes below the base pointer,” but if we abstract a little bit, this phrase is just a name for the location. Every activation of reverse uses this same name, but in every activation, the name stands for a different location in hardware memory. In the SVM, we’re going to achieve exactly the same effect, but with one difference: we use register names. Using a single register name to mean a different location in each activation is made possible by hardware called register windows.

The idea of register windows is that the SVM has thousands of registers, but at any one time, code can only refer to a contiguous sequence of 256 registers. That sequence is the register window.

At a typical procedure call, the register window shifts toward higher-numbered registers. Here’s a picture of the call to list1 in function reverse:

Register Windows in Action
Register Windows in Action

In the figure, here’s what’s going on:

At the point where list1 is called, let’s look at reverse and see how each register is used.

The next instruction to be executed is going to be

  r4 := call r4 (r5)

How can this work? Doesn’t the calling convention say the function goes in r0 and the first parameter in r1? Yes! When the VM sees the call instruction, it shifts the register window. The red r4 becomes the blue r0, the red r5 becomes the blue r1, and so on.

Imagine a piece of cardboard with a rectangular cutout 256 slots wide. That’s the window. At the call, the window slides 4 slots to the right. One beauty of this approach is that reverse can keep its private information in red registers r0 to r3, and nothing bad can ever happen to them, because they are inaccessible. Function list1 does not even have names for those registers.

In your VM, the easy way to implement register windows is to keep a pointer to register 0. A reference to a numbered register is implemented by an indexing operation from that pointer.

Hardware register windows: History and commentary

At the dawn of the microprocessor era in the 1970s, the circuits needed to build fast registers were so expensive that many processors didn’t have them. Early microprocessors had only two real registers (a program counter and a stack pointer). The Intel x86 architecture had a full eight general-purpose registers. Wow!

But from the poor compiler’s point of view, 8 registers aren’t a lot. The compiler has to figure out how to manage all of its local variables and temporary data using only 8 registers. This problem is called register allocation, and is one of the hardest problems in compilation. Many compiler courses, like this one, punt on register allocation—perhaps outsourcing it to LLVM.

By the middle 1980s, hardware had advanced enough that new instruction-set designs, like ARM and MIPS, supported 16 or even 32 general-purpose registers. (That’s why AMD64 is so much better than x86—it has twice as many registers.) And hardware kept getting better, and suddenly we had another problem: we started running out of space for register names. With 32 registers, an ADD instruction with three registers might spend 15 bits of the instruction encoding just to encode register names. It started to look like we couldn’t have more registers just because we couldn’t find bits in the instruction word. And then engineers at Berkeley and Sun had a brilliant idea: suppose we build hardware with more registers than we can actually name at one time? And we could change the names at will, so in different parts of the program, we could use the same names for different registers. That is the idea of register windows.

The register-windowed architecture with the largest commercial footprint was undoubtedly the SPARC. It provided 8 “global” registers along with a 24-register window that could be shifted in multiples of 16. This architecture made it pretty easy to build in, say, 72 or 80 hardware registers even though the instruction word could name only 32 registers.

Hardware register windows looked like a cool idea, but only for a brief era: hardware had enough transistors that it could support extra registers, but not so many that the hardware could basically remap instruction references into big, internal register numbers (which is the case today). That era lasted about 15 years: the amount of time that elapsed between the announcement of the SPARC architecture and the announcement of the AMD64 architecture. AMD64 updated the Intel platform from 8 registers to 16 registers. 16 registers are just good enough for most purposes, and Intel’s fabrication prowess, combined with the dominance of Microsoft Windows, eventually rendered the architectural insights of the 1980s and 1990s commercially irrelevant. Desktop and server applications are stuck with AMD64: a classic Worse is Better situation.

But for a virtual machine, register windows are still a superb choice.

Implementing functions and calls

Sliding the register window is only part of implementing a function call. To understand the rest, let’s start with what the VM needs to know about a function:

This information is part of the representation of a struct VMFunction defined in header file value.h. The information is used as follows:

Once a function’s information is known, the function can be called. And when its code starts to run, the function object itself is in register r0, and its actual parameters (1 to \(n\)) are in registers r1 to r\(n\). Higher-numbered registers (registers \(n+1\) to 255) contain unspecified values and are available for any purpose.

The call stack

A hardware implementation of procedure calls has to save the calling context in registers and memory. But in our SVM, we introduce additional state just for managing calls. (Real hardware has such state as well, but it is buried in caches and optimizations; it’s not visible in the instruction-set architecture.) That state is a stack, and for each activation it records the following information:

This information is stored in a record of type struct Activation, which you will define in header file vmstack.h. In the operational semantics, the record is written as (R, I₁•I₂, r), where R represents the register window, I₁•I₂ represents the instruction stream and program counter, and r represents the destination register. These records are organized into a finite array, which the VM manages as a stack.

The three instructions

The call stack is manipulated by three instructions: call, return, and tail call.

To call a function f, the caller places f’s value in register \(r[k]\), and it places N actual parameters in registers \(r[k+1]\) through \(r[k+N]\). It then executes a call instruction with the \(X\) field set to \(x\) (naming a result register), the Y field set to \(k\), and the Z field set to \(k+N\). The call is assumed to kill registers \(r[k]\) and higher—that is, all the registers in the callee’s window. When the call terminates, the returned value is in register \(r[x]\).

To implement the call, the VM pushes a struct Activation on the call stack, slides the register window, and activates the new function.

An active function f terminates in one of two ways:

Once you’ve implemented a register window, a call stack, and the three instructions, you’ve implemented procedure calls.


  1. If you’re rusty, I found a quick refresher at MIT.

  2. This is the number of formal parameters in the function’s definition.