There’s significant new code to understand this week, all of it related either to parsing or to the instruction table. By the end of the week, you’ll understand most of the C code in the SVM.

Binaries you can run

svm-loader-instructons

Displays all the instructions your loader can parse, together with an unparsing template for each one. It renders the unparsing part of your instruction list in file instructions.c.

svm

This is your Simple Virtual Machine. It reads virtual object code from all the files named on the command line—or if no files are named, from standard input. If you’re using my debugging infrastructure, you may sometimes want to invoke svm using the env commmand, as in this example:

env SVMDEBUG=decode svm hello.vo

Code you will write or edit

iparsers.c

This file contains a parser for each instruction format that is supported in virtual object code. In lab—steps (@parseliteral), (@parseR1LIT), and (@parseR1GLO)—you’ll be adding parseR1LIT and parse1GLO, for which parseR1U16 is a good model. Before you can do that, you’ll need to study the other parsers as well as the token interface in file tokens.h.

loader.c

Contains all the machinery needed to load virtual object code, except functions get_instruction (read one instruction) and loadfun (allocate space for a VMFunction and fill it with instructions). You’ll write get_instruction and loadfun.

instructions.c

The instruction table. For each instruction, includes the external name of its opcode, the internal name of the opcode, a parser, and an unparsing template. The table comes pre-populated with 5 instructions; you’ll add another 10. To understand what’s there you’ll need to look at files itable.h and instructions.h.

A detailed description of the instruction table can be found below.

opcode.h

You’ll be defining new instructions, and you’ll add their opcodes here. I’ve already added LoadLiteral.

Code you will revisit from last week

value.h: You’ll be allocating and initializing VMFunction values, so you’ll need to know their representation.
vmstate.c: You’ll have to make sure the literal_slot function works with more than one literal, and you’ll implement new functions literal_value, global_slot, and global_name.

New code you will look at and understand (parsing)

tokens.h: Representation of a tokenized input line (sequence of tokens) from a .vo file. A token is a name, a machine integer, or a double. Every instruction, literal, and so on is specified by a sequence of tokens.
name.h: Defines the Name abstraction, which is one form of token. This abstraction is called Atom_T in Hanson’s C Interfaces and Implementations and Name in my Programming Languages: Build, Prove, and Compare. By passing a string to function strtoname, we convert it to a Name, and names can be compared using pointer equality. Pointer equality takes constant time, which is great, but we use Names because pointer comparison is so much less error-prone than strcmp.
stable.h: Defines the STable_T abstraction. This is a finite map (“table”) from strings to unsigned integers.
iparsers.h: Defines what it means to be an InstructionParser. An instruction parser takes an opcode and a sequence of tokens and returns an instruction.
iformat.h: You’ve seen this header file before, but perhaps you looked only at the decoding functions. This week you’ll use the encoding function eR1U16 to encode instructions that use literals, and you’ll use encoding function eR0 to encode a Halt instruction.

New code you will look at and understand (instruction table)

itable.h, instructions.h: File itable.h describes the representation of an entry in the instruction table, and file instructions.h promises the existence of an instruction table. Both will help you understand file instructions.c, which you’ll edit. The instruction table is described in detail below.

New code that may be worth looking at

loader.h: This is the public interface to the loader. If you want to understand the role the loader plays in the system, this is the place to look.

New code that supports debugging, which you will want soon

I’ve written a quick guide to debugging, and there is an example of debugging code in my model vmrun, file src/svm/model/vmrun.c in the git repository. It uses these two interfaces:

svmdebug.h: This file provides general-purpose infrastructure to help you debug your SVM. It looks at an environment variable SVMDEBUG and helps you make decisions (like whether to print and what) based on what you find there. Documentation is in the file.
disasm.h: Exports a function you can call from your vmrun function to print out what is happening when you decode an instruction.

Things you can probably ignore

I’m trying to shield you from redundant code and from code that implements basic C techniques: data structures, argument processing, and that sort of thing. These things are listed here.

New code you can probably ignore

File svm.c defines the main function that launches an SVM. If you’re curious about how main functions are written, this one uses a classic style, but you don’t otherwise need to look at it. You can also ignore files svm-loader-instructions.c, tokens.c, name.c, and itable.c.

Code from last week you can almost certainly ignore

Nothing has changed in check-expect.h, print.h, vmheap.h, vmrun.h, or vmstring.h.

The instruction table

The instruction table lives in file instructions.c, and its format is defined by the instruction_info record type in file itable.h.¹ Each record is initialized with four fields:

The first field is a C string that identifies the opcode as it appears in virtual object code, like "halt".
The second field is an enumeration literal that identifies the opcode as it appears inside the vmrun function, like Halt.
The third field is a parsing function that is used to parse the instruction’s operands. Such a function, like parseR0, converts an opcode and a sequence of object-code tokens to an instruction. The parsing function is named after the format of the instruction; the halt instruction operates on no registers, so it is parsed with function parseR0 and encoded with function eR0. Another example, print, operates on one register, so it is parsed with function parseR1 and encoded with function eR1.²
The fourth field is an unparsing template which is used to project an instruction into a string. This particular unparsing is usually called “disassembly”—unparsing a binary-format instruction into something approximating assembly language. The unparsing templates for the print and check instructions will give you a better idea what can unparsing templates can do.

The unparser is called printasm and it’s implemented in file disasm.c.
The fifth field doesn’t appear in file instructions.c; it is not initialized until run time. It holds the same information as the first field, but represented as a Name, so that the itable_entry function can compare opcode names in constant time.

To prevent gcc from bleating about the absence of the missing field, I’ve added a special compiler pragma to file instructions.c.

The instruction table is dumped by binary svm-loader-instructions. The dump shows the external name of each opcode and the corresponding unparsing template.

“Record” is the generic term; struct is the C keyword.↩
The encoding function sits inside the parsing function, not in the instruction table, but it is relevant because you’ll need to write a parsing function of your own.↩