We want to create an efficient bytecode interpreter on a tight budget. Luckily, a bytecode interpreter has a readily identifiable inner loop: the code that fetches an instruction, decodes it, and executes its operation. To make the inner loop fast, we’ll limit abstraction, limit calls to C functions, and give the C compiler a chance to put frequently used values in machine registers.
Limiting abstraction means avoiding opaque types like Hanson’s
Seq_T
,Array_T
, and so on—at least for frequently used data. It should be possible to load an instruction from an instruction stream or a register from a register file without calling a function and without dereferencing more than one pointer. And we almost certainly want to do without run-time bounds checking—to keep pointer references valid, we should rely on other mechanisms. If we suspect difficulty, we can diagnose invalid references using Valgrind.One diagnostic technique that costs almost nothing is to place blocks of uninitialized memory in the
VMSTate
structure. These blocks, sometimes called “guard blocks”, fit between and around memory blocks that are OK to reference. If something goes wrong, Valgrind can spot reads from this memory. It is even possible to make API calls to Valgrind to tell it that the guard blocks may not safely be read or written.Limiting calls to C functions means either not calling functions in the first place or defining functions
static inline
. For example, because a single ALU instruction might project VM values into C values, operate on the C values, then embed the C result into a VM value, we want those embedding and projection functions to be inlined. And because we need to decode each instruction into its OP field and usually into its X, Y, and Z fields, those decoding functions also ought to be inlined. (Because encoding happens only once, at load time, and never in a dynamic loop, loading functions don’t need to be inlined.)Giving the C compiler a chance means caching key components of the VM state in C variables that are local to the
vmrun
function. The hard part is identifying which components are key. The program counter is probably key, because it is used on every instruction. The current instruction word itself might be key, because most instructions will extract several fields from it. And you can’t cache all 256 registers, but something about registers probably ought to be cached, because most instructions use at least one register.Which other values ought to be key is harder to decide. Should X, Y, and Z fields be stored in local C variables? I don’t know. Certainly not at the expense of other variables that might be more important. Definitive answers require developers to try alternatives, inspect, the generated assembly code, and measure performance. We won’t go that far; instead, we’ll rely on
educated guessesreasoned choices among alternatives.In addition to a run-time budget, we also have a tight programming budget. To use that budget wisely, we want to make each VM instruction easy to implement and easy to debug. That means making it easy to refer to registers, and also easy to do the embedding and projection that is one of the benefits of a virtual machine.
Don’t worry too much about errors and bugs. Focus on clean implementations of the instruction stream, the literal pool, and the register file. As long as you avoid fancy abstractions, clean code will be efficient, and if something does go wrong, it will be easy to debug.