The garbage collector and the VM state
During garbage collection, objects on the heap move. When an object moves, the garbage collector adjusts every pointer to the object so that it points to the new location. To adjust pointers successfully, the collector needs access to the entire VM state. According to the operational semantics, that state is \[\langle I_1 \bullet I_2, R, L, G, S, \sigma\rangle\text.\] If you care about efficiency, you probably have (at least) the instruction stream \(I_1 \bullet I_2\), the program counter \(\bullet\), and the register-window pointer \(R\) stored in local variables of vmrun
—not in the struct VMState
. But the garbage collector lives in file vmheap.c
and doesn’t have access to those local variables. What to do? The solution is to treat the local variables as a cache.
While the mutator (that is, the VM code that you care about) is running, part of the VM state should be kept in local variables of vmrun
. But before the garbage collector runs, that part of the state (the relevant local variables) should be saved in the struct VMState
, leaving the entire VM state on the C heap and visible to the garbage collector. And after the garbage collector terminates, the partial VM state, which may have changed, should be loaded from the struct VMState
back into the relevant local variables.
To perform the save and load operations, I recommend defining macros VMSAVE
AND VMLOAD
.
The garbage collector and the VM heap
The garbage collector doesn’t manage space for VM values. Values are stored in VM registers or even in local variables of C functions. But a value can point to a memory block allocated on the VM heap; that memory block is the value’s payload.
The garbage collector copies all reachable blocks to to-space, and to adjust all the pointers. The garbage collector thinks about data like this:
A memory block is the unit of allocation. A block may contain pointers and may also contain other blocks.
Things like a
struct Value
orstruct Activation
, which are not typically allocated on the managed heap, might nevertheless be treated as memory blocks for purposes of scanning.A pointer points to a memory block. The type of the memory block, and possibly its size, are inferred from a tag, which is not stored in the block; the tag is typically stored alongside the pointer in a
struct Value
.
The garbage collector copies live data through a set of mutually recursive operations:
If you have a pointer, forward it.
If you have a memory block, scan it.
Forwarding a pointer may copy the memory block to which it points, setting that block up for later scanning.
Scanning a block forwards any pointers that it contains (and also scans any blocks that it contains).
A block is set up for scanning by pairing a pointer to the block with a tag identifying the type of the block. This pair, which is represented as a
Value
, is then placed at the head of the gray list.
The garbage collector and VM registers
Every VM register whose contents might affect a future computation is either a register in the window of the currently active function, or it is a register in the window of some function that is suspended awaiting a return
. Such registers are considered live. Registers with numbers larger than the largest register used in the currently active function are not live, and their contents cannot affect future computations.
The garbage collector scans all live registers and forwards every pointer contained in a live register. The garbage collector relies on this invariant: every pointer contained in a live register points to a valid heap object. This invariant must be established and maintained throughout the execution of a program. But in a naïve implementation, it isn’t:
When the VM is initialized, the invariant is trivially true: every register is
nil
, so no register contains any pointers.When the garbage collector runs, it adjusts every pointer in every live register, so the invariant holds for every register that is scanned. Higher-numbered registers aren’t scanned, but they aren’t live, so there appears to be no problem.1
During the execution of every VM instruction except calls, when given inputs that contain only valid pointers, produces an output that contains only valid pointers.
Immediately after a call, the register window shifts, and \(N\) new registers appear to the garbage collector to be live. But they aren’t initialized, and in fact they may contain stale pointers from registers that were dead at the previous collection.
The potential fault is subtle, but in past SVMs it has tripped up more than one person. This bug can be fixed in any of three ways:
The garbage collector is called only at “safe points”: every
cons
, everymkclosure
, and every call. Each safe point is labeled by the UFT with the registers that the UFT knows are live at that point. The contents of those registers are guaranteed (by the UFT) not to be stale. The fix changes the collector to use the label; instead of scanning every register \(r\) in the range \(0 \le r < \mathtt{nregs}\), the fixed collector scans only those registers that the UFT labels as live.This fix results in the most efficient code at run time, but it’s complicated to implement.
After the collection, the collector can be fixed so it zeroes out all unscanned registers. Using this fix, the system maintains a stronger invariant: every pointer in every register, alive or dead, points to a valid heap object. This fix is easy to implement using the C standard function
memset
, but it’s a bit slow.Alternatively, the collector can maintain the stronger invariant more efficiently by zeroing out only those unscanned registers that have been in use since the last collection. This fix would require maintaining a “high-water mark” at each call, marking the highest numbered register potentially in use. The invariant would then be that no register above the high-water mark ever contains a bad pointer. Maintaining the high-water mark would be relatively cheap, as the high-water mark could be used in the overflow check.
I recommend that your garbage collector zero all unscanned registers. The other two fixes are available for depth points.