The design of a binary format for virtual-machine instructions is governed by these considerations:
Decoding is done in software, so it has to be fast.
For similar reasons, and to simplify the implementation of
goto
, all instructions should be the same size.Because virtual registers are more or less free, the instruction format shouldn’t limit them unnecessarily—a single instruction should be capable of naming a lot of different registers.
In an academic setting, the instruction format should be easy to debug.
The SVM instruction format is inspired by the MIPS instruction set developed at Stanford in the 1980s. This CPU architecture, which was used in the Sony PlayStation and PlayStation 2, used a 32-bit instruction word and supported 32 hardware registers. The SVM instruction format also nods to the instruction format of the Lua virtual machine, which supports 512 virtual registers.
An SVM instruction fits in a 32-bit word. It includes an 8-bit opcode in bits 24 to 31 (the most significant bits); the remaining 24 bits may be used to code three 8-bit register names, one 8-bit register name and a 16-bit index, or a 24-bit signed offset. I used 8 bits for the field sizes so that you could read off the fields of any instruction just by looking at the bits rendered in hexadecimal. An 8-bit field can name 256 different registers, and that will be enough for us.
The four bytes of the instruction are named OP
, X
, Y
, and Z
. The (unsigned) values of these fields are extracted by decoding functions opcode
, uX
, uY
, and uZ
.
The SVM supports four instruction formats:
The R3 format names an opcode and three registers:
OP X Y Z Instructions in the R3 format are created by encoding functions that take up to three registers; the functions are named
eR3
,eR2
,eR1
, andeR0
. Registers that are not supplied are filled in as zeroes.The R3 format is used for ALU instructions, like addition, comparison, and so on.
The R2U8 format is just like the R3 format, except the Z field stands for an unsigned index, not a register number. Instructions in this format are created by encoding function
eR2U8
. This format is used for instructions that index into records at statically known offsets.The R1U16 format names an opcode, one register, and a 16-bit unsigned index stored in the Y and Z bytes combined, written YZ:
OP X YZ Instructions in the R1U16 format are created by encoding function
eR1U16
. The unsigned YZ field can be extracted by decoding functionuYZ
. The format is used for instructions that operate on a register and a literal, likeload-literal
, or on a register and a global variable, likesetglobal
. The YZ field is used as an unsigned 16-bit index into the literal pool or the global-register table.The R0I24 format names an opcode and a 24-bit signed offset in the X, Y, and Z bytes combined, written XYZ:
OP XYZ Instructions in the R0I24 format are created by encoding function
eR0I24
. The signed XYZ field can be extracted by decoding functioniXYZ
. The format is used for thegoto
instruction.
The formats and the decoding functions tell you everything you need to know to write your vmrun
function in module 1. The encoding functions won’t be used until module 2.