Specifying Representations of Machine Instructions

Norman Ramsey and Mary Fernández.

Abstract

We present SLED, a Specification Language for Encoding and Decoding, which describes abstract, binary, and assembly-language representations of machine instructions. Guided by a SLED specification, the New Jersey Machine-Code Toolkit generates bit-manipulating code for use in applications that process machine code. Programmers can write such applications at an assembly-language level of abstraction, and the toolkit enables the applications to recognize and emit the binary representations used by the hardware.

SLED is suitable for describing both CISC and RISC machines; we have specified representations of MIPS R3000, SPARC, Alpha, and Intel Pentium instructions, and toolkit users have written specifications for the Power PC and Motorola 68000. The paper includes representative excerpts from our SPARC and Pentium specifications.

SLED uses four elements; fields and tokens describe parts of instructions, patterns describe binary representations of instructions or groups of instructions, and constructors map between the abstract and binary levels. By combining the elements in different ways, SLED supports machine-independent implementations of machine-level concepts like conditional assembly, span-dependent instructions, relocatable addresses, object code, sections, and relocation. SLED specifications can be checked automatically for consistency with existing assemblers.

The implementation of the toolkit is largely determined by our representations of patterns and constructors. We use a normal form that facilitates construction of encoders and decoders. The paper describes the normal form and its use.

The toolkit has been used to help build several applications. We have built a retargetable debugger and a retargetable, optimizing linker. Colleagues have built a dynamic code generator, a decompiler, and an execution-time analyzer. The toolkit generates efficient code; for example, the linker emits binary up to 15% faster than it emits assembly language, making it 1.7--2 times faster to produce an a.out directly than by using the assembler.

Full paper

The final PDF for the paper (328K) is available. There is also a PostScript version (278K) and an HTML version (129K), but the HTML was generated automatically from LaTeX source, and the result isn't always readable.