Overview of code in src/svm

All the code in the src/svm directory of the student code repository is mentioned below. The code you’ll work with is divided into three categories:

There’s also code you may never look at.

In each category, the most important code is listed first.

Code you will write or edit

vmstate.h

Using ideas from the operational semantics, you’ll define the C representation of your SVM’s state. (A 20-line template is supplied.)

vmstate.c

You’ll manage memory: allocate, initialize, and free. And you’ll add values to the literal pool. (A 50-line template is supplied.)

vmrun.c

You’ll write a vmrun function that runs code on the VM. The version for this module will need to understand only a handful of instructions. (A 40-line template is supplied.)

opcode.h

List of opcodes your vmrun function will implement. It’s pre-populated with opcodes for Halt, Print, Check, and Expect. Over the first six weeks of the course, you’ll add many more. (A 10-line initial version is supplied.)

Code you will look at and understand

When implementing your vmrun function, you’ll need these interfaces:

value.h

Code that defines all the values that VM code can work with. You’ll use it when designing and implementing your VM instruction set—especially the embedding and projection functions. The code is explained in detail below. (160 lines of declarations, plus another 100 lines of static inline functions)

vmrun.h

The specification for your vmrun function (10 lines).

iformat.h

Functions for encoding and decoding VM instructions, as mentioned in the handout on instruction formats. In this module, you only decode instructions; encoding is for module 2. These functions are all special cases of the Bitpack interface that you may have implemented in CS 40. (30 lines, plus 10 lines of static inline functions)

check-expect.h

Functions check and expect, which you will call from your Check and Expect instructions. The “protocol” in the comment says that you have to call check before calling expect, and after a call to check, client code most call expect before it would be OK to call check_assert or report_unit_tests. (20 lines)

vmerror.h

Function runerror, which calls fprintf and then signals a checked run-time error. You’ll call it if your vmrun function encounters a machine opcode that it doesn’t recognize (or doesn’t implement). Also includes typerror, which handles the common special case where a value has the wrong type—you might call it if a machine instruction tries to add two Booleans, for example. (15 lines)

Code you will look at eventually, but perhaps not this week

print.h

An extensible printer for use in debugging. Defines functions print and fprint, which are similar to printf and fprintf. They don’t do any of the fancy formatting, but they can print Values directly. Example:

static Value testval; // initialized to 0
fprint(stderr, "Test value should be nil,"
               " is %v\n", testval);

The printer is adapted from my book Programming Languages: Build, Prove, and Compare, and it is described in that book’s Supplement. (70 lines, almost none of which you will ever use)

svm-test.c

This function defines main, which makes a few test calls to vmrun. This code will provide a starting point when you want to create your own main function next week, to run the SVM for real. (40 lines)

testfuns.h, testfuns.c

This code creates three functions that are used to test your vmrun. There is not much point looking at it now, but the testfuns.c will be useful to look at next week—you will be writing an object-code loader that will create functions using the same API calls as are used here. (20 lines and 40 lines)

vmheap.h, vmheap.c

Provide macros VMNEW and VMNEWC, which are used to declare and initialize C pointer variables. These macros call vmalloc_raw and vmcalloc_raw, which have the same interface as malloc and calloc, except they are guaranteed never to return NULL. You’ll need VMNEW when you want to implement a cons instruction. (30 lines and 130 lines)

In module 11, we’ll extend this code with support for garbage collection. Until then, there is no point in looking at the implementation.

You can go a long time without these:

vmstring.h

Virtual-machine strings, which have a different representation from C strings. (50 lines)

Code you may never look at

There are implementations of stuff you already know how to do (iformat.c, vtable.c), some general-purpose infrastructure for printing (print.c, printbuf.c) or unit testing (check-expect.c), and some straightforward implementations of simple interfaces (value.c, vmerror.c). None of these is really worth your time.

The one implementation that’s interesting is in vmstring.c. That implementation contains a classic optimization that make it very efficient to use short strings as keys in hash tables. Since we use string-indexed hash tables primarily in the loader, this optimization isn’t terribly crucial for us, so we won’t spend any time on it. (Besides, the code is really more data structures than programming languages.)

Detailed explanation of value.h

The set of values that the VM supports drives the entire design. So that your VM can be used for languages beyond vScheme, I have deliberately chosen an expansive set. Most things that you could want are supported natively, with just two exception: objects and bignums. Objects can be simulated using tables, and bignums add too much complexity to the implementation—they are a depth option.

The representations of values

Lines 13 to 63 define Value as a tagged union. In ML or Haskell, it would be an algebraic data type. The VTag is the tag, and the Value contains both a tag and a payload. The payload’s type depends on the tag, which is why the payloads sit in a union.

Tags fall into four groups; for this module, you need only consider the first two groups.

The tagnames on line 42 map each tag to a C string. I’ve chosen to keep the code simple, with the sad consequence that it’s easy for the tags and the strings to get out of sync.

The struct Value with its payloads is defined on lines 45 to 63. Note that Value is not a pointer type! A Value is meant to correspond to a single VM register, and it is not allocated on the VM heap. Only big payloads (strings, blocks, and so on) are allocated on the VM heap.

Embedding, projection, and other value functions

Because the representation of Value is completely exposed, there is no real need to define any functions on it. But without some convenience functions, our thoughts would drown in a sea of assignment statements and tag checks. The functions I provide are organized by Barbara Liskov’s classifications, which divides them into creators, producers, mutators, and observers. (See Chapter 2 of my book.) They lean heavily on embedding and projection:

Representations of payloads

Some payloads have representations defined on lines 114 to 156. Notably, the payload for a ConsCell is a block with two slots. You’ll need that to implement cons, car, and cdr. The rest can be ignored until module 7.

Implementation

From line 159 onward, the value.h file contains implementations of the embedding and projection functions. Embedding and projection is essential to the implementation of most VM instructions, so these functions are all defined static inline.