Preliminaries
The loader reads instructions from a virtual object file and appends them to an instruction stream. The central operation produces a single VM instruction \(i\) and may have a side effect on a literal pool. The semantic domains in play are as follows
\(V\) | VM values |
đ© | Names of vScheme variables (note that \(đ© \subset V\)) |
â | Natural numbers |
â | instructions |
đȘ | Object code (a sequence of tokens) |
Here are the metavariables used to notate elements of or functions on the various semantic domains, along with their types.
Metavariable | Type | Description |
---|---|---|
\(n\), \(k\) | â | A natural number |
\(v\) | \(V\) | A value |
\(L\) | â â \(V\) | The literal pool |
\(N\) | đ© â â Sl | ot numbers of global variables |
\(i\) | â | An instruction |
\(I\) | â list | A sequence of instructions |
\(O\) | đȘ list | A sequence of tokens |
The literal pool is meant to be a partial function: itâs defined only on a subset of natural numbers.
Loading judgment and transitions
The loader takes a sequence of tokens, consumes a prefix of that sequence, and returns an instruction. As a side effect, it may modify the literal pool or the placement of global variables. To express this action in a judgment of operational semantics, I supply an initial literal pool \(L\) and a final literal pool \(L'\), and the same for placements of global variables \(N\) and \(N'\):
\(âšO,L,Nâ©ââši,O',L',N'â©\)
The grammar for virtual object code shows three forms of instruction, and there is a set of transitions for each form.
The ordinary instruction form is actually four forms, depending on whether the instruction in question expects zero, one, two, or three operands. IÂ show just the three-operand case:
\[âš\mathit{opcode} · X · Y · Z · \verb+\n+ · O, L, Nâ©ââš\mathtt{eR3}(\mathit{opcode}, X, Y, Z), O, L, Nâ©\]
The literal form must convert a sequence of tokens to a value. IÂ imagine a function
asLiteral
that takes a sequence of tokens \(O'\) and returns the value \(v\) that the tokens code for. Then if the opcode wants an argument in the literal pool, the value \(v\) is (nondeterministically) guaranteed to be in the literal pool \(L'\):\[\frac{ \begin{array}{c} \mathit{wantsLiteral}(\mathit{opcode})\\ \mathtt{asLiteral}(O') = v \qquad L â L' \qquad L'(k) = v \end{array} }{âš\mathit{opcode} · X · O' · \verb+\n+ · O, L, Nâ©ââš\mathtt{eR1U16}(\mathit{opcode}, X, k), O, L', Nâ©} \]
The key premise in this rule is \(L â L'\). We say that \(L\) approximates \(L'\). It means that the domain of \(L\) is a subset of the domain of \(L'\), and that the two functions agree on that domain. In other words, if we imagine a function as a set of ordered pairs, \(L\) is a subset of \(L'\). Since in semantics we use functions for everything, this is a relatively common construction. We can formalize it like this:
\[\frac{ dom L â dom L' \qquad â x â dom L : L(x) = L'(x) }{ L â L' }\]
The other thing to notice about the literal-form rule is that \(k\), which is the index of \(v\) in \(L'\), is not specified. \(L'\) just has to index \(v\) somewhere. The specification is purposefully nondeterministicâit gives you the freedom to use a very fast implementation of
literal_slot
, or a very space-efficient one, or possibly both.Instructions that get or set global variables also use the literal form, but they work with the global-variable table, not with the literal pool. The loader needs a mapping from the name of each global variable to its slot number. (The slot number then determines the location via the mapping \(G\) described in module 1. If you want to specify the operational semantics of these forms, you can earn a depth point.
The last form is the form to load a function. For this one, I need a new judgment, \(âšO,L,N,nâ©ââšI,O',L',N'â©\), which means âload a sequence of \(n\) instructions.â The last form of
<instruction>
codes for such a load (corrected to show three states for the literal poolâinitial \(L\), intermediate \(L'\), and final \(L''\)):\[\frac{ \begin{array}{c} âšO,L,N,nâ©ââšI,O',L',N'â© \qquad â r â I : r < m\\ v â \mathtt{mkVMFunctionValue}(a, I, m)\\ L' â L''\qquad L''(k) = v \end{array} }{ âš\mathtt{.load} · X · \mathtt{function} · a · n · \verb+\n+ · O, L,Nâ©ââš\mathtt{eR1U16}(\mathtt{LoadLiteral}, X, k), O', L'',N'â©} \]
The idea is to load a sequence of \(n\) instructions and put the result in a function. The arity of the function is given by operand \(a\), and the
nregs
field \(m\) has to be bigger than any register appearing in the instruction sequence \(I\).A sequence of instructions is loaded just as you would expect. Loading 0 instructions produces an empty sequence Δ:
\[\frac{}{âšO,L,N,0â©ââšÎ”,O,L,Nâ©}\]
AÂ nonempty sequence of instructions is loaded one at a time:
\[\frac{ âšO,L,Nâ©ââši,O',L',N'â©\qquad âšO',L',N',n-1â©ââšI,O'',L'',N''â© }{ âšO,L,N,nâ©ââši·I,O'',L'',N''â© } \]
What the operational semantics buys us
The grammar tells us if a sequence of tokens constitutes valid virtual object code, but to know what to do with a valid sequence of tokens, we need the operational semantics. This situation is just like the situation we have in CS 105: a grammar tells us about syntax, but to know what happen when we run the code, we need the operational semantics. The only difference here is that the abstract machine is running at load time, to load stuff into the VM state.
This particular operational semantics has a couple of points that IÂ like:
I like the nondeterminism for the literals: âJust make sure my stuff is in the literal pool; I donât care where you put it.â
I like that the semantics doesnât check if the number of operands is actually compatible with the opcode. The real-life loader could skip the checks also, and on correct programs it would work just as well. But if we make a mistake in our compiler or assembler, those checks are awfully useful, so our implementation will put some in.
How to use the operational semantics
IÂ hope you wonât need this semantics, but if you have questions about how âload functionâ is supposed to work, the semantics may answer them. (The nregs
may remain a bit mysterious.)