Operational Semantics of the VM, part II: Calls

Introduction

This semantics extends the basic semantic description from module 1: it adds a call stack and register windows. Where material from module 1 is presented for a second time, significant extensions are highlighted

Semantic domains

The VM computes with values V. Registers are numbered with natural numbers ℕ, as are slots in the literal pool and the global-variable table. Instructions ℐ are left unspecified but will be notated using typewriter font, as in examples halt, check(\(X\),\(YZ\)), or add(\(X\), \(Y\), \(Z\)).

Semantic domains
V VM values
Natural numbers
locations (in the store)
instructions

Metavariables

Here are the metavariables used to notate elements of or functions on the various semantic domains, along with their types. Everything from module 1 is carried forward unchanged; only the call stack S is new.

Metavariables
Metavariable Type    Description
n A natural number
v V A value
A location
R ℕ → ℒ A window on VM registers
S A call stack (of type ((ℕ → ℒ) × ℐ list × ℕ) list)
G ℕ → ℒ The global variables
σ ℒ → V The store
L ℕ → V The literal pool
i An instruction
I ℐ list A sequence of instructions
δ Semantic function for an ALU instruction

Notation for the instruction stream

An instruction stream is a sequence of instructions. In this semantics, I notate both cons and append using the small infix dot ·. At need that dot can also stand for snoc. So if \(i\) is an instruction and \(I\), \(I_1\), and \(I_2\) are sequences of instructions, then each of the following notations stands for a sequence of instructions:

\(i\cdot I\)
\(I_1 \cdot I_2\)
\(I \cdot i\)

The VM, like a real machine, has an instruction pointer that points to a single instruction in the stream. That pointer is represented as a fat dot •. For example, if the VM has already executed instruction sequence \(I_1\), is about to execute an add instruction, and then will continue with instruction sequence \(I_2\), the control part of its state will be notated as \(I_1 \bullet \mathtt{add}(X,Y,Z) \cdot I_2\). When the form of the instruction about to be executed doesn’t matter, the control part is notated just as \(I_1 \bullet I_2\).

State of the abstract machine

The state of an abstract machine has these elements:

I₁•i·I₂ An instruction stream, with the IP pointing to i
I₁•I₂ Alternate notation for instruction stream, with the IP pointing to I₂
R A window onto register file; every register stands for a location
L A literal pool; every valid index stands for a value
G A global-variable table mapping every valid index to a location
σ A store specifying the value of every register and global variable
S A call stack

What’s new here is the call stack. In module 1, the VM state has a single instruction stream and a single program counter. Now, we extend VM state with a stack of suspended activations. A suspended activation holds the state of a function that executed a call instruction and is waiting for the call to return. A suspended activation has three components:

The suspended activations are organized into a call stack, which is a list of frames each of the form (R, I₁•i·I₂, r).

The machine’s state is written \(\langle I_1 \bullet i \cdot I_2, R, L, G, S, \sigma\rangle\), or when we don’t care about the form of instruction \(i\), \(\langle I_1 \bullet I_2, R, L, G, S, \sigma\rangle\).

Transitions to do with calls

To read the semantics of calls, it will help to know the informal calling convention:

Returns

The easiest rule to understand is the rule for a return instruction. It pops a suspended activation off the call stack, using the components to update the machine’s state. The result returned is copied from register \(r\) to the destination register \(r'\).

\[\frac{ }{ \begin{array}{l} \langle I_1 \bullet \mathtt{return}\;r \cdot I_2, R, L, G, (R', I'_1 \bullet I'_2, r') \cdot S, \sigma\rangle \rightarrow\\ \qquad\qquad\qquad\qquad \langle I'_1 \bullet I'_2, R', L, G, S, \sigma\{R'(r') \mapsto \sigma (R(r))\} \rangle \end{array} } \]

The final store \(\sigma\{R'(r') \mapsto \sigma (R(r))\}\) shows that the result register \(r\) is named in the register-name space of the callee, but the destination register \(r'\) is named in the register-name space of the caller. That’s as expected, because the register name \(r\) comes from the return instruction, which is part of the callee’s name space, but \(r'\) comes from the call instruction, which is part of the caller’s name space.

Ordinary calls

The next easiest rule is for the call instruction. It suspends execution of the current function, copying parts the current instruction stream and register window to a new suspended activation on the call stack. It shifts the register window from \(R\) to \(R \oplus r_0\), and it continues execution with the called function. The register-window shift is described below.

\[\frac{ \sigma(R(r 0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I }{ \begin{array}{l} \langle I_1 \bullet r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow\\ \qquad \langle \epsilon \bullet I, R \oplus r_0, L, G, (R, I_1 \cdot r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \bullet I_2, r) \cdot S, \sigma \rangle \end{array} }\]

The call succeeds only if register \(r_0\) contains a function value and the function expects \(n\) parameters. In that case, the VM continues execution at the beginning of the instruction stream \(I\), which is the body of the function.

Tail calls

The third and final call-related instruction is tailcall. The syntactic form \(\mathtt{tailcall}(x, x_1, \ldots, x_n)\) behaves essentially like \(\mathtt{return}\;(\mathtt{call}(x, x_1, \ldots, x_n))\), but nothing is pushed onto the call stack. The called function replaces the currently active function, and the register window does not shift. What does shift are register values; the function being called is copied to register 0, and the values in registers \(r_1\) to \(r_n\) are copied into registers 1 to \(n\). The store \(\sigma_i\) represents the state of the store after register \(i\) has been initialized in the callee’s register-name space.

\[\begin{array}{c} \sigma(R(r_0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I\\ \sigma_0 = \sigma\{R(0) \mapsto \sigma(R(r_0))\}\\ \frac{ \displaystyle \sigma_i = \sigma_{i-1}\{R(i) \mapsto \sigma(R(r_i))\}, 1 \le i \le n }{ \displaystyle \langle I_1 \bullet \mathtt{tailcall}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow \langle \epsilon \bullet I, R, L, G, S, \sigma_n \rangle } \end{array} \]

Semantics of register windows

In module 1, the register file \(R\) maps exactly 256 locations: each of the register numbers 0 to 255 is mapped to a distinct location. To support calls, we extend that register file to support many more locations, perhaps thousands of them. The register file looks different to each live activation; each activation has its own view or “window” onto the underlying register storage. That window is what is meant by the notation \(R\). Any particular activation can name only registers \(R(0)\) to \(R(255)\), but in every activation, the underlying locations are different. To shift a register window, I use notation \(R\oplus k\); because \(R\) is a function mapping register numbers to locations, the shift is easy to define:

\[(R\oplus k)(i) = R(k+i)\text.\]