% l2h ignore change { \chapter{Excerpts from {\tt mld}, an encoding application} [[mld]] is an optimizing, retargetable linker that links an intermediate code before generating object code for a target architecture. It performs the functions of a conventional assembler, i.e., it generates object code, and of a conventional linker, i.e., it assigns absolute addresses to data and text segments, places data and procedures in the appropriate segments, and relocates references to external symbols. In this section, we provide some snippets of code from [[mld]] to illustrate how the library is used in an encoding application. Many of the examples are target independent and are used by [[mld]] for each target architecture. The target-dependent excerpts are for the MIPS. \subsection{Using encoding procedures} {\tt mld}'s code generators are based on those used in the {\tt lcc} compiler, which emit assembly code~\cite{fraser:retargetable}. Much of each code generator is generated from a BURG specification \cite{fraser:burg}, which contains rules for rewriting intermediate-code subtrees to assembly-language templates; the rest of the code generator is written by hand. Adapting a code generator means modifying both the BURG specification and the hand-written parts. The toolkit simplifies those modifications. Code that prints assembly language is replaced with code that calls the encoding procedures generated by the toolkit; the following three rules are extracted from [[mld]]'s set of rules that map assembly-language templates to calls to encoding procedures. The format arguments [[%c]] and [[%0]] refer to the registers chosen by the code generator. \begin{verbatim} "cvt.d.s $f%c,$f%0" cvt_d_s(%c, %0), nop() "mov.d $f%c,$f%0" mov_d(%c, %0) "not $%c,$%0" nor(%c, 0, %0) \end{verbatim} Hand-written assembly statements are replaced in a similar manner; for example, \begin{quote} [[print("fadds %f1, %f2, %f3");]] \end{quote} is replaced by \begin{quote} [[fadds(f1, f2, f3);]] \end{quote} a call to a toolkit-generated encoding procedure. The rules for mapping assembly to binary must be written for each target architecture. \subsection{Using relocatable blocks} The encoding procedures use the toolkit's library to write the generated code into relocatable blocks in {\tt mld}'s memory; {\tt mld} calls the library to create relocatable blocks and to set the current relocatable block. At startup, {\tt mld} assigns one relocatable block to each data section by calling [[block_new]]. [[mld]] uses four of the MIPS data sections: read only ([[rdata]]), short data ([[sdata]]), initialized data ([[data]]) and uninitialized data ([[bss]]). [[mld]] pre-computes the sizes of each data section and provides these sizes to [[block_new]] as hints of the maximum size in bytes of each relocatable block. The following snippet initializes the blocks. <<[[mld]] example>>= rdata = block_new(rdata_size); sdata = block_new(sdata_size); data = block_new(data_size); bss = block_new(bss_size); @ [[mld]] assigns a relocatable block to each procedure for which it generates object code by calling target-independent [[new_procedure]]. In [[new_procedure]], [[text]] is a global variable that refers to the relocatable block of the current procedure. [[set_block]] is called first to set the current relocatable block to the current procedure. [[block_new]] is then called to create a new relocatable block; it uses [[size]] as a hint of the procedure's maximum size in bytes. The procedure's absolute address in the text segment is set using the current [[pc]] as its start address; [[align(4)]] aligns the [[pc]] on a 4-byte word~\footnote{The current location in any text block should already be aligned on a 4-byte word. This is just a sanity check.} If the current [[pc]] is unknown, the base address of the code segment is used. Finally, [[set_block]] sets the current relocatable block to the new text block. All subsequent calls to encoding procedures will emit instructions in the new relocatable block. <<[[mld]] example>>= static void new_procedure(size, name) char *name; { set_block(text); text = block_new(size); text->label->name = name; align(4); set_address(text, cur_pc_known() ? cur_pc() : segments[CODE].baseaddr); set_block(text); } @ The adapted code generators also call the library directly, e.g., to emit data. The target-dependent [[segment]] procedure sets the segment in which data is emitted; the current relocatable block is set to the appropriate segment. <<[[mld]] example>>= static void segment(int s) { switch(s) { case LIT: set_block(rdata); break; case DATA: set_block(data); break; case BSS: set_block(bss); break; default: assert(0); } } @ \subsection{Using relocatable addresses} {\tt mld} associates a relocatable address with each label and global symbol. The target-independent procedure [[global]] defines a global symbol in the current data segment. If the symbol does not have a relocatable address, [[mld]] calls [[addr_new]] to create and initialize its address. An error message is issued if the symbol is already defined. <<[[mld]] example>>= static void global(p) Symbol p; { <> if (!block_defined(p->x.raddr->label->block)) label_define(p->x.raddr->label, p->x.raddr->offset); else error("Attempt to redefine %s\n", p->name); } <>= if (p->x.raddr == (RAddr)0) p->x.raddr = addr_new(label_new(p->name), 0); @ References to global symbols often appear in data segments, e.g., as the initial values of other global symbols. [[defaddress]] emits the relocatable address of a symbol into the current data segment by calling [[emit_raddr]], the constructor defined in Section~\ref{sec:emitinfo} that emits a relocatable address. <<[[mld]] example>>= static void defaddress(p) Symbol p; { <> emit_raddr(p->x.raddr); } @ \subsection{Relocation} When an encoding procedure emits an instruction that refers to an unknown relocatable address, it creates a closure to save the information necessary for relocating the instruction. [[mld]] supplies the procedure [[mc_alloc_closure]], which creates a closure and saves it in a table associated with the current relocatable block in the current segment. The toolkit library initializes the returned closure. @ <<[[mld]] example>>= RClosure mc_alloc_closure(size_in_bytes, dest_block, dest_lc) unsigned size_in_bytes; RBlock dest_block; unsigned dest_lc; { RClosure cl; cl = (RClosure)mc_alloc(size_in_bytes, RClosure_pool); assert(dest_block == segments[cur_seg].relocblocks[segments[cur_rb]); addtotable(RClosure *, &(segments[cur_seg].closures[cur_rb]), cl); return cl; } @ The choice of when to perform relocation is application dependent. [[mld]] performs relocation after it has assigned an absolute address to all of the data segments and procedures and just before emitting an executable program. The toolkit provides most of the support for performing relocation in [[apply_closure]]. [[mld]]'s [[dispatch]] procedure applies [[apply_closure]] to every closure it has saved. If any relocatable address is unknown, [[apply_closure]] calls [[error]] to issue an error message. The implementations of [[save_closure]] and [[dispatch]] are target independent. {\hfuzz=1.4pt\par} <<[[mld]] example>>= void dispatch() { int seg, rb, i; RClosure cl; for (seg = 0; seg < NSEGS; seg++) for (rb = 0; segments[seg].relocblocks[rb] != 0; rb++) gentable(segments[seg].closures[rb], i, 0) { cl = tableentry(RClosure *, segments[seg].closures[rb], i); apply_closure(cl, cl_emitm, &error); } } @ \subsection{Example makefile} The following example makefile shows how {\tt tk.o}, an object file containing all of the encoding procedures needed by {\tt mld}, is created. {\tt mips.tk} is created by {\tt tools} with {\tt mips.spec}, the specification provided with the toolkit, and {\tt mld-mips.spec}, a specification file specific to {\tt mld}, are provided as inputs. The implementation of the encoding procedures are extracted from {\tt mips.tk} into {\tt tk.c} using {\tt notangle}; the interface prototypes are extracted into {\tt tk.h}. The library interface file, {\tt mclib.h}, is included by {\tt tk.c}. \begin{verbatim} mips.tk: mips.spec mld-mips.spec tools -byteorder fast -lc-cons-names -encoder mips.tk \ mips.spec mld-mips.spec tk.c: mips.tk notangle -t8 -L mips.tk | cpif tk.c tk.h: mips.tk notangle -t8 -L -Rheader mips.tk | cpif tk.h tk.o: tk.c tk.h mclib.h mldtk.h cc -c -I./ tk.c \end{verbatim}