Introduction
This semantics extends the basic semantic description from module 1: it adds a call stack and register windows. Where material from module 1 is presented for a second time, significant extensions are highlighted
Semantic domains
The VM computes with values V. Registers are numbered with natural numbers ℕ, as are slots in the literal pool and the global-variable table. Instructions ℐ are left unspecified but will be notated using typewriter font, as in examples halt
, check
(\(X\),\(YZ\)), or add
(\(X\), \(Y\), \(Z\)).
V | VM values |
ℕ | Natural numbers |
ℒ | locations (in the store) |
ℐ | instructions |
Metavariables
Here are the metavariables used to notate elements of or functions on the various semantic domains, along with their types. Everything from module 1 is carried forward unchanged; only the call stack S is new.
Metavariable | Type | Description |
---|---|---|
n | ℕ | A natural number |
v | V | A value |
ℓ | ℒ | A location |
R | ℕ → ℒ | A window on VM registers |
S | ⋯ | A call stack (of type ((ℕ → ℒ) × ℐ list × ℕ) list ) |
G | ℕ → ℒ | The global variables |
σ | ℒ → V | The store |
L | ℕ → V | The literal pool |
i | ℐ | An instruction |
I | ℐ list | A sequence of instructions |
δ | ⋯ | Semantic function for an ALU instruction |
Notation for the instruction stream
An instruction stream is a sequence of instructions. In this semantics, I notate both cons
and append
using the small infix dot ·. At need that dot can also stand for snoc
. So if \(i\) is an instruction and \(I\), \(I_1\), and \(I_2\) are sequences of instructions, then each of the following notations stands for a sequence of instructions:
\(i\cdot I\)
\(I_1 \cdot I_2\)
\(I \cdot i\)
The VM, like a real machine, has an instruction pointer that points to a single instruction in the stream. That pointer is represented as a fat dot •. For example, if the VM has already executed instruction sequence \(I_1\), is about to execute an add
instruction, and then will continue with instruction sequence \(I_2\), the control part of its state will be notated as \(I_1 \bullet \mathtt{add}(X,Y,Z) \cdot I_2\). When the form of the instruction about to be executed doesn’t matter, the control part is notated just as \(I_1 \bullet I_2\).
State of the abstract machine
The state of an abstract machine has these elements:
I₁•i·I₂ | An instruction stream, with the IP pointing to i |
I₁•I₂ | Alternate notation for instruction stream, with the IP pointing to I₂ |
R | A window onto register file; every register stands for a location |
L | A literal pool; every valid index stands for a value |
G | A global-variable table mapping every valid index to a location |
σ | A store specifying the value of every register and global variable |
S | A call stack |
What’s new here is the call stack. In module 1, the VM state has a single instruction stream and a single program counter. Now, we extend VM state with a stack of suspended activations. A suspended activation holds the state of a function that executed a call
instruction and is waiting for the call to return. A suspended activation has three components:
The instruction stream and program counter, written \(I₁•I₂\), record where the function will resume execution when the call returns. This notation means the same as the “\(I•i·I\)” notation, except it doesn’t distinguish the first instruction after the program counter—that instruction \(i\) is (implicitly) the first instruction in the list \(I₂\).
The destination register, written \(r\), records where the result value from the call will be placed.
The register window, written \(R\), records the locations that the suspended activation associates with registers 0 to 255. These locations are different for every activation.
The suspended activations are organized into a call stack, which is a list of frames each of the form (R, I₁•i·I₂, r).
The machine’s state is written \(\langle I_1 \bullet i \cdot I_2, R, L, G, S, \sigma\rangle\), or when we don’t care about the form of instruction \(i\), \(\langle I_1 \bullet I_2, R, L, G, S, \sigma\rangle\).
Transitions to do with calls
To read the semantics of calls, it will help to know the informal calling convention:
Whatever object is called appears to the callee code as a value in register 0. (For now, we can only call functions, but in module 10, we will call closures.)
Actual parameters, if any, appear to the callee in registers 1 to \(n\).
The caller’s
call
instruction is written as \(r \mathrel{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n)\), where \(r_0\) is the number of the register holding the function being called, and \(r_1\) to \(r_n\) are the numbers of the registers holding the actual parameters (if any).The notation makes it look as if register numbers \(r_0\) to \(r_n\) are arbitrary, and the choice of the function register \(r_0\) is indeed arbitrary, but the coding of the call instruction forces the remaining numbers to follow \(r_0\) consecutively. Register number \(r_0\) is coded in the \(Y\) field of the
call
instruction, and register number \(r_n\) is coded in the \(Z\) field. The value of \(n\) is therefore \(Z-Y\), and all the register numbers in between are computed by interpolation. Once \(r_0\) is chosen, there’s no choice about \(r_n\): \(r_n = r_0 + n\), where \(n\) is the number of parameters passed at the call site.By contrast, the destination register \(r\) is arbitrary; it is coded in the \(X\) field of the
call
instruction.
Returns
The easiest rule to understand is the rule for a return
instruction. It pops a suspended activation off the call stack, using the components to update the machine’s state. The result returned is copied from register \(r\) to the destination register \(r'\).
\[\frac{ }{ \begin{array}{l} \langle I_1 \bullet \mathtt{return}\;r \cdot I_2, R, L, G, (R', I'_1 \bullet I'_2, r') \cdot S, \sigma\rangle \rightarrow\\ \qquad\qquad\qquad\qquad \langle I'_1 \bullet I'_2, R', L, G, S, \sigma\{R'(r') \mapsto \sigma (R(r))\} \rangle \end{array} } \]
The final store \(\sigma\{R'(r') \mapsto \sigma (R(r))\}\) shows that the result register \(r\) is named in the register-name space of the callee, but the destination register \(r'\) is named in the register-name space of the caller. That’s as expected, because the register name \(r\) comes from the return
instruction, which is part of the callee’s name space, but \(r'\) comes from the call
instruction, which is part of the caller’s name space.
Ordinary calls
The next easiest rule is for the call instruction. It suspends execution of the current function, copying parts the current instruction stream and register window to a new suspended activation on the call stack. It shifts the register window from \(R\) to \(R \oplus r_0\), and it continues execution with the called function. The register-window shift is described below.
\[\frac{ \sigma(R(r 0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I }{ \begin{array}{l} \langle I_1 \bullet r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow\\ \qquad \langle \epsilon \bullet I, R \oplus r_0, L, G, (R, I_1 \cdot r \mathbin{:=} \mathtt{call}\;r_0(r_1, \ldots, r_n) \bullet I_2, r) \cdot S, \sigma \rangle \end{array} }\]
The call succeeds only if register \(r_0\) contains a function value and the function expects \(n\) parameters. In that case, the VM continues execution at the beginning of the instruction stream \(I\), which is the body of the function.
Tail calls
The third and final call-related instruction is tailcall
. The syntactic form \(\mathtt{tailcall}(x, x_1, \ldots, x_n)\) behaves essentially like \(\mathtt{return}\;(\mathtt{call}(x, x_1, \ldots, x_n))\), but nothing is pushed onto the call stack. The called function replaces the currently active function, and the register window does not shift. What does shift are register values; the function being called is copied to register 0, and the values in registers \(r_1\) to \(r_n\) are copied into registers 1 to \(n\). The store \(\sigma_i\) represents the state of the store after register \(i\) has been initialized in the callee’s register-name space.
\[\begin{array}{c} \sigma(R(r_0)) = \mathtt{funcode}\ x_1 \cdots x_n \mathbin{\mathtt{=>}} I\\ \sigma_0 = \sigma\{R(0) \mapsto \sigma(R(r_0))\}\\ \frac{ \displaystyle \sigma_i = \sigma_{i-1}\{R(i) \mapsto \sigma(R(r_i))\}, 1 \le i \le n }{ \displaystyle \langle I_1 \bullet \mathtt{tailcall}\;r_0(r_1, \ldots, r_n) \cdot I_2, R, L, G, S, \sigma\rangle \rightarrow \langle \epsilon \bullet I, R, L, G, S, \sigma_n \rangle } \end{array} \]
Semantics of register windows
In module 1, the register file \(R\) maps exactly 256 locations: each of the register numbers 0 to 255 is mapped to a distinct location. To support calls, we extend that register file to support many more locations, perhaps thousands of them. The register file looks different to each live activation; each activation has its own view or “window” onto the underlying register storage. That window is what is meant by the notation \(R\). Any particular activation can name only registers \(R(0)\) to \(R(255)\), but in every activation, the underlying locations are different. To shift a register window, I use notation \(R\oplus k\); because \(R\) is a function mapping register numbers to locations, the shift is easy to define:
\[(R\oplus k)(i) = R(k+i)\text.\]