There’s significant new code to understand this week, all of it related either to parsing or to the instruction table. By the end of the week, you’ll understand most of the C code in the SVM.
Binaries you can run
svm-loader-instructons
Displays all the instructions your loader can parse, together with an unparsing template for each one. It renders the unparsing part of your instruction list in file
instructions.c
.svm
This is your Simple Virtual Machine. It reads virtual object code from all the files named on the command line—or if no files are named, from standard input. If you’re using my debugging infrastructure, you may sometimes want to invoke
svm
using theenv
commmand, as in this example:env SVMDEBUG=decode svm hello.vo
Code you will write or edit
iparsers.c
This file contains a parser for each instruction format that is supported in virtual object code. In lab—steps (@parseliteral), (@parseR1LIT), and (@parseR1GLO)—you’ll be adding
parseR1LIT
andparse1GLO
, for whichparseR1U16
is a good model. Before you can do that, you’ll need to study the other parsers as well as the token interface in filetokens.h
.loader.c
Contains all the machinery needed to load virtual object code, except functions
get_instruction
(read one instruction) andloadfun
(allocate space for aVMFunction
and fill it with instructions). You’ll writeget_instruction
andloadfun
.instructions.c
The instruction table. For each instruction, includes the external name of its opcode, the internal name of the opcode, a parser, and an unparsing template. The table comes pre-populated with 5 instructions; you’ll add another 10. To understand what’s there you’ll need to look at files
itable.h
andinstructions.h
.A detailed description of the instruction table can be found below.
opcode.h
You’ll be defining new instructions, and you’ll add their opcodes here. I’ve already added
LoadLiteral
.
Code you will revisit from last week
value.h
You’ll be allocating and initializing
VMFunction
values, so you’ll need to know their representation.vmstate.c
You’ll have to make sure the
literal_slot
function works with more than one literal, and you’ll implement new functionsliteral_value
,global_slot
, andglobal_name
.
New code you will look at and understand (parsing)
tokens.h
Representation of a tokenized input line (sequence of tokens) from a
.vo
file. A token is a name, a machine integer, or adouble
. Every instruction, literal, and so on is specified by a sequence of tokens.name.h
Defines the
Name
abstraction, which is one form of token. This abstraction is calledAtom_T
in Hanson’s C Interfaces and Implementations andName
in my Programming Languages: Build, Prove, and Compare. By passing a string to functionstrtoname
, we convert it to aName
, and names can be compared using pointer equality. Pointer equality takes constant time, which is great, but we useName
s because pointer comparison is so much less error-prone thanstrcmp
.stable.h
Defines the
STable_T
abstraction. This is a finite map (“table”) from strings to unsigned integers.iparsers.h
Defines what it means to be an
InstructionParser
. An instruction parser takes an opcode and a sequence of tokens and returns an instruction.iformat.h
You’ve seen this header file before, but perhaps you looked only at the decoding functions. This week you’ll use the encoding function
eR1U16
to encode instructions that use literals, and you’ll use encoding functioneR0
to encode aHalt
instruction.
New code you will look at and understand (instruction table)
itable.h
,instructions.h
File
itable.h
describes the representation of an entry in the instruction table, and fileinstructions.h
promises the existence of an instruction table. Both will help you understand fileinstructions.c
, which you’ll edit. The instruction table is described in detail below.
New code that may be worth looking at
loader.h
This is the public interface to the loader. If you want to understand the role the loader plays in the system, this is the place to look.
New code that supports debugging, which you will want soon
I’ve written a quick guide to debugging, and there is an example of debugging code in my model vmrun
, file src/svm/model/vmrun.c
in the git repository. It uses these two interfaces:
svmdebug.h
This file provides general-purpose infrastructure to help you debug your SVM. It looks at an environment variable
SVMDEBUG
and helps you make decisions (like whether to print and what) based on what you find there. Documentation is in the file.disasm.h
Exports a function you can call from your
vmrun
function to print out what is happening when you decode an instruction.
Things you can probably ignore
I’m trying to shield you from redundant code and from code that implements basic C techniques: data structures, argument processing, and that sort of thing. These things are listed here.
New code you can probably ignore
File svm.c
defines the main
function that launches an SVM. If you’re curious about how main
functions are written, this one uses a classic style, but you don’t otherwise need to look at it. You can also ignore files svm-loader-instructions.c
, tokens.c
, name.c
, and itable.c
.
Code from last week you can almost certainly ignore
Nothing has changed in check-expect.h
, print.h
, vmheap.h
, vmrun.h
, or vmstring.h
.
The instruction table
The instruction table lives in file instructions.c
, and its format is defined by the instruction_info
record type in file itable.h
.1 Each record is initialized with four fields:
The first field is a C string that identifies the opcode as it appears in virtual object code, like
"halt"
.The second field is an enumeration literal that identifies the opcode as it appears inside the
vmrun
function, likeHalt
.The third field is a parsing function that is used to parse the instruction’s operands. Such a function, like
parseR0
, converts an opcode and a sequence of object-code tokens to an instruction. The parsing function is named after the format of the instruction; thehalt
instruction operates on no registers, so it is parsed with functionparseR0
and encoded with functioneR0
. Another example,print
, operates on one register, so it is parsed with functionparseR1
and encoded with functioneR1
.2The fourth field is an unparsing template which is used to project an instruction into a string. This particular unparsing is usually called “disassembly”—unparsing a binary-format instruction into something approximating assembly language. The unparsing templates for the
print
andcheck
instructions will give you a better idea what can unparsing templates can do.The unparser is called
printasm
and it’s implemented in filedisasm.c
.The fifth field doesn’t appear in file
instructions.c
; it is not initialized until run time. It holds the same information as the first field, but represented as aName
, so that theitable_entry
function can compare opcode names in constant time.To prevent
gcc
from bleating about the absence of the missing field, I’ve added a special compiler pragma to fileinstructions.c
.
The instruction table is dumped by binary svm-loader-instructions
. The dump shows the external name of each opcode and the corresponding unparsing template.