The New Jersey Machine-Code Toolkit (Abstract)

Norman Ramsey and Mary Fernández.

The New Jersey Machine-Code Toolkit helps programmers write applications that process machine code. Applications that use the toolkit are written at an assembly-language level of abstraction, but they recognize and emit binary. Guided by a short instruction-set specification, the toolkit generates all the bit-manipulating code.

The toolkit's specification language uses four concepts: fields and tokens describe parts of instructions, patterns describe binary encodings of instructions or groups of instructions, and constructors map between the assembly-language and binary levels. These concepts are suitable for describing both CISC and RISC machines; we have written specifications for the MIPS R3000, SPARC, and Intel 486 instruction sets.

We have used the toolkit to help write two applications: a debugger and a linker. The toolkit generates efficient code; for example, the linker emits binary up to 15% faster than it emits assembly language, making it 1.7-2 times faster to produce an a.out directly than by using the assembler.

The full paper is available in PostScript form, and also in a slightly lame automatic conversion to HTML.