Garbage-Collector Interactions

The garbage collector and the VM state

During garbage collection, objects on the heap move. When an object moves, the garbage collector adjusts every pointer to the object so that it points to the new location. To adjust pointers successfully, the collector needs access to the entire VM state. According to the operational semantics, that state is \[\langle I_1 \bullet I_2, R, L, G, S, \sigma\rangle\text.\] If you care about efficiency, you probably have (at least) the instruction stream \(I_1 \bullet I_2\), the program counter \(\bullet\), and the register-window pointer \(R\) stored in local variables of vmrunnot in the struct VMState. But the garbage collector lives in file vmheap.c and doesn’t have access to those local variables. What to do? The solution is to treat the local variables as a cache.

While the mutator (that is, the VM code that you care about) is running, part of the VM state should be kept in local variables of vmrun. But before the garbage collector runs, that part of the state (the relevant local variables) should be saved in the struct VMState, leaving the entire VM state on the C heap and visible to the garbage collector. And after the garbage collector terminates, the partial VM state, which may have changed, should be loaded from the struct VMState back into the relevant local variables.

To perform the save and load operations, I recommend defining macros VMSAVE AND VMLOAD.

The garbage collector and the VM heap

The garbage collector doesn’t manage space for VM values. Values are stored in VM registers or even in local variables of C functions. But a value can point to a memory block allocated on the VM heap; that memory block is the value’s payload.

The garbage collector copies all reachable blocks to to-space, and to adjust all the pointers. The garbage collector thinks about data like this:

The garbage collector copies live data through a set of mutually recursive operations:

The garbage collector and VM registers

Every VM register whose contents might affect a future computation is either a register in the window of the currently active function, or it is a register in the window of some function that is suspended awaiting a return. Such registers are considered live. Registers with numbers larger than the largest register used in the currently active function are not live, and their contents cannot affect future computations.

The garbage collector scans all live registers and forwards every pointer contained in a live register. The garbage collector relies on this invariant: every pointer contained in a live register points to a valid heap object. This invariant must be established and maintained throughout the execution of a program. But in a naïve implementation, it isn’t:

The potential fault is subtle, but in past SVMs it has tripped up more than one person. This bug can be fixed in any of three ways:

I recommend that your garbage collector zero all unscanned registers. The other two fixes are available for depth points.


  1. Just wait…