Overview of code in src/svm
All the code in the src/svm
directory of the student code repository is mentioned below. The code you’ll work with is divided into three categories:
- Code you will write or edit
- Code you will look at and understand
- Code you will look at eventually
There’s also code you may never look at.
In each category, the most important code is listed first.
Code you will write or edit
vmstate.h
Using ideas from the operational semantics, you’ll define the C representation of your SVM’s state. (A 20-line template is supplied.)
vmstate.c
You’ll manage memory: allocate, initialize, and free. And you’ll add values to the literal pool. (A 50-line template is supplied.)
vmrun.c
You’ll write a
vmrun
function that runs code on the VM. The version for this module will need to understand only a handful of instructions. (A 40-line template is supplied.)opcode.h
List of opcodes your
vmrun
function will implement. It’s pre-populated with opcodes forHalt
,Print
,Check
, andExpect
. Over the first six weeks of the course, you’ll add many more. (A 10-line initial version is supplied.)
Code you will look at and understand
When implementing your vmrun
function, you’ll need these interfaces:
value.h
Code that defines all the values that VM code can work with. You’ll use it when designing and implementing your VM instruction set—especially the embedding and projection functions. The code is explained in detail below. (160 lines of declarations, plus another 100 lines of
static inline
functions)vmrun.h
The specification for your
vmrun
function (10 lines).iformat.h
Functions for encoding and decoding VM instructions, as mentioned in the handout on instruction formats. In this module, you only decode instructions; encoding is for module 2. These functions are all special cases of the
Bitpack
interface that you may have implemented in CS 40. (30 lines, plus 10 lines ofstatic inline
functions)check-expect.h
Functions
check
andexpect
, which you will call from yourCheck
andExpect
instructions. The “protocol” in the comment says that you have to callcheck
before callingexpect
, and after a call tocheck
, client code most callexpect
before it would be OK to callcheck_assert
orreport_unit_tests
. (20 lines)vmerror.h
Function
runerror
, which callsfprintf
and then signals a checked run-time error. You’ll call it if yourvmrun
function encounters a machine opcode that it doesn’t recognize (or doesn’t implement). Also includestyperror
, which handles the common special case where a value has the wrong type—you might call it if a machine instruction tries to add two Booleans, for example. (15 lines)
Code you will look at eventually, but perhaps not this week
print.h
An extensible printer for use in debugging. Defines functions
print
andfprint
, which are similar toprintf
andfprintf
. They don’t do any of the fancy formatting, but they can printValue
s directly. Example:static Value testval; // initialized to 0 fprint(stderr, "Test value should be nil," " is %v\n", testval);
The printer is adapted from my book Programming Languages: Build, Prove, and Compare, and it is described in that book’s Supplement. (70 lines, almost none of which you will ever use)
svm-test.c
This function defines
main
, which makes a few test calls tovmrun
. This code will provide a starting point when you want to create your ownmain
function next week, to run the SVM for real. (40 lines)testfuns.h
,testfuns.c
This code creates three functions that are used to test your
vmrun
. There is not much point looking at it now, but thetestfuns.c
will be useful to look at next week—you will be writing an object-code loader that will create functions using the same API calls as are used here. (20 lines and 40 lines)vmheap.h
,vmheap.c
Provide macros
VMNEW
andVMNEWC
, which are used to declare and initialize C pointer variables. These macros callvmalloc_raw
andvmcalloc_raw
, which have the same interface asmalloc
andcalloc
, except they are guaranteed never to returnNULL
. You’ll needVMNEW
when you want to implement acons
instruction. (30 lines and 130 lines)In module 11, we’ll extend this code with support for garbage collection. Until then, there is no point in looking at the implementation.
You can go a long time without these:
vmstring.h
Virtual-machine strings, which have a different representation from C strings. (50 lines)
Code you may never look at
There are implementations of stuff you already know how to do (iformat.c
, vtable.c
), some general-purpose infrastructure for printing (print.c
, printbuf.c
) or unit testing (check-expect.c
), and some straightforward implementations of simple interfaces (value.c
, vmerror.c
). None of these is really worth your time.
The one implementation that’s interesting is in vmstring.c
. That implementation contains a classic optimization that make it very efficient to use short strings as keys in hash tables. Since we use string-indexed hash tables primarily in the loader, this optimization isn’t terribly crucial for us, so we won’t spend any time on it. (Besides, the code is really more data structures than programming languages.)
Detailed explanation of value.h
The set of values that the VM supports drives the entire design. So that your VM can be used for languages beyond vScheme, I have deliberately chosen an expansive set. Most things that you could want are supported natively, with just two exception: objects and bignums. Objects can be simulated using tables, and bignums add too much complexity to the implementation—they are a depth option.
The representations of values
Lines 13 to 63 define Value
as a tagged union. In ML or Haskell, it would be an algebraic data type. The VTag
is the tag, and the Value
contains both a tag and a payload. The payload’s type depends on the tag, which is why the payloads sit in a union.
Tags fall into four groups; for this module, you need only consider the first two groups.
The first group (line 15) holds only
Nil
. In vScheme,Nil
is the value of a variable that is not otherwise defined. This is effectively the samenil
that you find in Lua or Smalltalk.The
Nil
tag is deliberately coded as tag 0. As a consequence, aValue
that is initialized to all zeros, as bycalloc
for example, is a well-formednil
.The second group (lines 19–26) holds tags for the basic Scheme values: Booleans, numbers, symbols, lists, and functions. Lists come in two forms (
Emptylist
andConsCell
) and so do functions (VMFunction
andVMClosure
). Functions are separated by a blank line because we won’t do much with them until the second half of the course, at which point we’ll do both first-order and higher-order functions in rapid succession. (But yourvmrun
function will take aVMFunction
as an argument.)The third group (lines 31–33) supports some depth goals:
Block
for records whose size is known at compile time,Seq
for Hanson-style arrays, andTable
for hash tables. All of these are optional.The fourth group (lines 38 and 39) supports another depth goal: C-language extensions to the base language. These are the representations you would use if you wanted to import file I/O functions or string-library functions implemented natively in C.
The tagnames
on line 42 map each tag to a C string. I’ve chosen to keep the code simple, with the sad consequence that it’s easy for the tags and the strings to get out of sync.
The struct Value
with its payloads is defined on lines 45 to 63. Note that Value
is not a pointer type! A Value
is meant to correspond to a single VM register, and it is not allocated on the VM heap. Only big payloads (strings, blocks, and so on) are allocated on the VM heap.
Embedding, projection, and other value functions
Because the representation of Value
is completely exposed, there is no real need to define any functions on it. But without some convenience functions, our thoughts would drown in a sea of assignment statements and tag checks. The functions I provide are organized by Barbara Liskov’s classifications, which divides them into creators, producers, mutators, and observers. (See Chapter 2 of my book.) They lean heavily on embedding and projection:
Creators (lines 67 to 80) make new values; almost all of them are embedding functions that embed C values (example: mkBooleanValue). Values
nil
and'()
don’t require creation functions; they arenilValue
andemptylistValue
.Producers would make values from other values, but because the representation is exposed, there’s no point in defining producers—that’s the role of instructions like
cons
.Mutators change the state of a value, but because the
Value
type is immutable, there aren’t any mutators. Some of the payloads are mutable (example:VTable_T
), but those payloads have their own.h
files.Observers get information out of values. Most of the observers are projection functions (lines 84 to 103), and because projection can fail, they take arguments that can be used to generate an error message and a stack trace. The macro versions fill in the file name and line number at the call site. For example, to get a string to pass to the
check
function, you’re meant to use macroAS_CSTRING
, which you pass aValue
and a pointer to a VM state.I’ve deliberately not supplied a projection function for Booleans. In \(\mu\)Scheme, any value can be project to a Boolean: the value
#f
projects as falsehood, and all other values project as truth. In vScheme, the valuenil
(not found in \(\mu\)Scheme) also projects as false. But you can define Boolean projection any way you like. You can even define more than one, and you can use different projections for different instructions. Some instructions might want to accept any value as “truthy” or “falsey”, and others might insist on receiving only honest Booleans. The choice is yours.Not all observers are projection functions; others are found on lines 107 to 112. Any value can be hashed, and there are three (!) different ways to compare values for equality.
Representations of payloads
Some payloads have representations defined on lines 114 to 156. Notably, the payload for a ConsCell
is a block with two slots. You’ll need that to implement cons
, car
, and cdr
. The rest can be ignored until module 7.
Implementation
From line 159 onward, the value.h
file contains implementations of the embedding and projection functions. Embedding and projection is essential to the implementation of most VM instructions, so these functions are all defined static inline
.