# Introduction

This semantics extends the basic semantic description from module 1: it adds a call stack and register windows. Where material from module 1 is presented for a second time, significant extensions are highlighted

# Semantic domains

The VM computes with values V. Registers are numbered with natural numbers ℕ, as are slots in the literal pool and the global-variable table. Instructions ℐ are left unspecified but will be notated using typewriter font, as in examples halt, check($$X$$,$$YZ$$), or add($$X$$, $$Y$$, $$Z$$).

 V VM values ℕ Natural numbers ℒ locations (in the store) ℐ instructions

# Metavariables

Here are the metavariables used to notate elements of or functions on the various semantic domains, along with their types. Everything from module 1 is carried forward unchanged; only the call stack S is new.

Metavariables
Metavariable Type    Description
n A natural number
v V A value
A location
R ℕ → ℒ A window on VM registers
S A call stack (of type ((ℕ → ℒ) × ℐ list × ℕ) list)
G ℕ → ℒ The global variables
σ ℒ → V The store
L ℕ → V The literal pool
i An instruction
I ℐ list A sequence of instructions
δ Semantic function for an ALU instruction

# Notation for the instruction stream

An instruction stream is a sequence of instructions. In this semantics, I notate both cons and append using the small infix dot ·. At need that dot can also stand for snoc. So if $$i$$ is an instruction and $$I$$, $$I_1$$, and $$I_2$$ are sequences of instructions, then each of the following notations stands for a sequence of instructions:

$$i\cdot I$$
$$I_1 \cdot I_2$$
$$I \cdot i$$

The VM, like a real machine, has an instruction pointer that points to a single instruction in the stream. That pointer is represented as a fat dot •. For example, if the VM has already executed instruction sequence $$I_1$$, is about to execute an add instruction, and then will continue with instruction sequence $$I_2$$, the control part of its state will be notated as $$I_1 \bullet \mathtt{add}(X,Y,Z) \cdot I_2$$. When the form of the instruction about to be executed doesn’t matter, the control part is notated just as $$I_1 \bullet I_2$$.

# State of the abstract machine

The state of an abstract machine has these elements:

 I₁•i·I₂ An instruction stream, with the IP pointing to i I₁•I₂ Alternate notation for instruction stream, with the IP pointing to I₂ R A window onto register file; every register stands for a location L A literal pool; every valid index stands for a value G A global-variable table mapping every valid index to a location σ A store specifying the value of every register and global variable S A call stack

What’s new here is the call stack. In module 1, the VM state has a single instruction stream and a single program counter. Now, we extend VM state with a stack of suspended activations. A suspended activation holds the state of a function that executed a call instruction and is waiting for the call to return. A suspended activation has three components:

• The instruction stream and program counter, written $$I₁•I₂$$, record where the function will resume execution when the call returns. This notation means the same as the “$$I•i·I$$” notation, except it doesn’t distinguish the first instruction after the program counter—that instruction $$i$$ is (implicitly) the first instruction in the list $$I₂$$.

• The destination register, written $$r$$, records where the result value from the call will be placed.

• The register window, written $$R$$, records the locations that the suspended activation associates with registers 0 to 255. These locations are different for every activation.

The suspended activations are organized into a call stack, which is a list of frames each of the form (R, I₁•i·I₂, r).

The machine’s state is written $$\langle I_1 \bullet i \cdot I_2, R, L, G, S, \sigma\rangle$$, or when we don’t care about the form of instruction $$i$$, $$\langle I_1 \bullet I_2, R, L, G, S, \sigma\rangle$$.

# Transitions to do with calls

To read the semantics of calls, it will help to know the informal calling convention:

• Whatever object is called appears to the callee code as a value in register 0. (For now, we can only call functions, but in module 10, we will call closures.)

• Actual parameters, if any, appear to the callee in registers 1 to $$n$$.

• The caller’s call instruction is written as $$r \mathrel{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n)$$, where $$r_0$$ is the number of the register holding the function being called, and $$r_1$$ to $$r_n$$ are the numbers of the registers holding the actual parameters (if any).

The notation makes it look as if register numbers $$r_0$$ to $$r_n$$ are arbitrary, and the choice of the function register $$r_0$$ is indeed arbitrary, but the coding of the call instruction forces the remaining numbers to follow $$r_0$$ consecutively. Register number $$r_0$$ is coded in the $$Y$$ field of the call instruction, and register number $$r_n$$ is coded in the $$Z$$ field. The value of $$n$$ is therefore $$Z-Y$$, and all the register numbers in between are computed by interpolation. Once $$r_0$$ is chosen, there’s no choice about $$r_n$$: $$r_n = r_0 + n$$, where $$n$$ is the number of parameters passed at the call site.

By contrast, the destination register $$r$$ is arbitrary; it is coded in the $$X$$ field of the call instruction.

## Returns

The easiest rule to understand is the rule for a return instruction. It pops a suspended activation off the call stack, using the components to update the machine’s state. The result returned is copied from register $$r$$ to the destination register $$r'$$.

$\frac{ }{ \begin{array}{l} \langle I_1 \bullet \mathtt{return}\;r \cdot I_2, R, L, G, (R', I'_1 \bullet I'_2, r') \cdot S, \sigma\rangle \rightarrow\\ \qquad\qquad\qquad\qquad \langle I'_1 \bullet I'_2, R', L, G, S, \sigma\{R'(r') \mapsto \sigma (R(r))\} \rangle \end{array} }$

The final store $$\sigma\{R'(r') \mapsto \sigma (R(r))\}$$ shows that the result register $$r$$ is named in the register-name space of the callee, but the destination register $$r'$$ is named in the register-name space of the caller. That’s as expected, because the register name $$r$$ comes from the return instruction, which is part of the callee’s name space, but $$r'$$ comes from the call instruction, which is part of the caller’s name space.

## Ordinary calls

The next easiest rule is for the call instruction. It suspends execution of the current function, copying parts the current instruction stream and register window to a new suspended activation on the call stack. It shifts the register window from $$R$$ to $$R \oplus r_0$$, and it continues execution with the called function. The register-window shift is described below.

$\frac{ \sigma(R(r 0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I }{ \begin{array}{l} \langle I_1 \bullet r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow\\ \qquad \langle \epsilon \bullet I, R \oplus r_0, L, G, (R, I_1 \cdot r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \bullet I_2, r) \cdot S, \sigma \rangle \end{array} }$

The call succeeds only if register $$r_0$$ contains a function value and the function expects $$n$$ parameters. In that case, the VM continues execution at the beginning of the instruction stream $$I$$, which is the body of the function.

## Tail calls

The third and final call-related instruction is tailcall. The syntactic form $$\mathtt{tailcall}(x, x_1, \ldots, x_n)$$ behaves essentially like $$\mathtt{return}\;(\mathtt{call}(x, x_1, \ldots, x_n))$$, but nothing is pushed onto the call stack. The called function replaces the currently active function, and the register window does not shift. What does shift are register values; the function being called is copied to register 0, and the values in registers $$r_1$$ to $$r_n$$ are copied into registers 1 to $$n$$. The store $$\sigma_i$$ represents the state of the store after register $$i$$ has been initialized in the callee’s register-name space.

$\begin{array}{c} \sigma(R(r_0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I\\ \sigma_0 = \sigma\{R(0) \mapsto \sigma(R(r_0))\}\\ \frac{ \displaystyle \sigma_i = \sigma_{i-1}\{R(i) \mapsto \sigma(R(r_i))\}, 1 \le i \le n }{ \displaystyle \langle I_1 \bullet \mathtt{tailcall}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow \langle \epsilon \bullet I, R, L, G, S, \sigma_n \rangle } \end{array}$

# Semantics of register windows

In module 1, the register file $$R$$ maps exactly 256 locations: each of the register numbers 0 to 255 is mapped to a distinct location. To support calls, we extend that register file to support many more locations, perhaps thousands of them. The register file looks different to each live activation; each activation has its own view or “window” onto the underlying register storage. That window is what is meant by the notation $$R$$. Any particular activation can name only registers $$R(0)$$ to $$R(255)$$, but in every activation, the underlying locations are different. To shift a register window, I use notation $$R\oplus k$$; because $$R$$ is a function mapping register numbers to locations, the shift is easy to define:

$(R\oplus k)(i) = R(k+i)\text.$