All term we will be implementing new languages by adding a little bit to a language we already have. Our first new language is virtual object code, which adds these three little features to virtual-machine code:

It has a well-defined representation on disk.
It expresses a literal value directly as part of an instruction, instead of indirectly as an index into a literal pool.
It refers to each global variable directly by name, instead of indirectly as an index into global-variable table.

The design I have chosen for the on-disk format exemplifies a strategy good for putting any kind of binary data on disk: represent a data structure on disk as a program that, when interpreted, materializes the data structure. Virtual object code is just such a program, and to describe some aspects of it, I have give it a tiny operational semantics.

The main new technology needed to implement virtual object is parsing, which is described in another handout.

Grammar

You may have seen grammars and EBNF in other courses, including 105. If not, or if you want a refresher, there is decent material from Matt Might, the Wikipedia article is not terrible, and there’s a decent short summary from Pete Jinks, which also mentions railroad diagrams. Or you can consult the parsing sources mentioned in the module handout.

Here’s the grammar of our virtual object code:

<modules>     ::= { <module> }

<module>      ::= .load module <length> \n
                  <body>
                  
<body>        ::= { <instruction> }

<instruction> ::= <opcode> {<operand>} \n
               |  <opcode> <register> <literal> \n
               |  .load <register> function <arity> <length> \n
                  <body>

<literal>     ::= true | false | <number> | emptylist | nil
               |  string <length> { <byte> }

Where

<opcode> is the mnemonic name of an opcode (single token)
<operand> is an integer literal representing a register number or immediate value
<register> is an integer literal in the range 0..255
<number> is a numeric literal (integer or floating point)
<length> is an integer literal
<arity> is an integer literal
<byte> is an integer literal in the range 0..255

A virtual object file is a program that defines a list of modules, plus has side effects on a VM state. The program has a well-defined semantics, which I’ll describe informally.

As you’ll see if you look at file loader.h, a module is a single function, and a function is defined by a sequence of instructions. Each module begins with a line that says how many instructions are in the module, and each instruction appears on one line.
A single instruction is specified by its opcode and operands. The opcode is a name (as defined in name.h and parsed in tokens.h); it’s a maximal sequence of non-blank characters that does not represent a number. Each operand is a register number or literal unsigned integer, except for one special case: an operand can be a literal value.
Not all values can be expressed as literals in virtual object code, but it is possible to specify Booleans, numbers, the empty list, the special value nil, and strings. Values that can’t be expressed as literals, like cons cells, have to be created at run time by executing VM instructions.
There is also one special-case instruction: using the .load form, a literal function can be loaded into a register. The .load form specifies the arity of the function (the number of parameters it is expecting), and the length (the number of instructions in its body). The instructions of the function body follow the .load form immediately.

The .load form generates a single LoadLiteral instruction, with the function value as the literal.