Relocating Machine Instructions by Currying (Abstract)

Relocation adjusts machine instructions to account for changes in the locations of the instructions themselves or of external symbols to which they refer. Standard linkers implement a finite set of relocation transformations, suitable for a single architecture. These transformations are enumerated, named, and engraved in a machine-dependent object-file format, and linkers must recognize them by name. These names and their associated transformations are an unnecessary source of machine-dependence.

An alternative is to use SLED (Specification Language for Encoding and Decoding) to specify representations of machine instructions. Instructions are described by constructors, which denote functions mapping lists of operands to instructions' binary representations. Any operand can be designated as ``relocatable,'' meaning that the operand's value need not be known at the time the instruction is encoded. From a SLED specification, the New Jersey Machine-Code Toolkit can generate functions that encode instructions in the native binary representation. For instructions with relocatable operands, the toolkit also computes relocating transformations. Tool writers can create machine-independent software that uses these transformations to relocate machine instructions. For example, mld, a retargetable linker built with the toolkit, needs only 20 lines of C code for relocation, and that code is machine-independent.

The toolkit discovers relocating transformations by currying encoding functions. An attempt to encode an instruction with a relocatable operand results in the creation of a closure. The closure can be applied when the values of the relocatable operands become known. Currying provides a general, machine-independent method of relocation.

Currying rewrites a lambda-term into two nested lambda-terms. The standard implementation has the first lambda allocate a closure and store therein its operands and a pointer to the second lambda. Using this strategy in the toolkit means that, when it builds an application, the toolkit generates code for many different inner lambda-terms---one for each instruction that uses a relocatable address. Hoisting some of the computation out of the second lambda into the first makes many of the second lambdas identical---a handful are enough for a whole instruction set. This optimization reduces the size of machine-dependent assembly and linking code by 15--25% for the Alpha, MIPS, SPARC, and PowerPC, and by about 40% for the Pentium. It also makes the second lambdas equivalent to relocating transformations named in standard object-file formats.

The full paper is available in PostScript form. The paper contains too much mathematics to make an HTML conversion useful.