ldb represents symbol tables as PostScript programs, which its embedded PostScript interpreter evaluates as necessary. This approach provides a machine-independent mechanism for representing symbol tables that contain both code and data, shields the debugger from irrelevant information, and supports machine-independent expression evaluation.
ldb is an experiment in coupling between compiler and debugger. In most systems, compiler and debugger are connected only by machine-dependent symbol table data. In some experimental systems, the compiler and debugger execute in the same address space, calling each other and sharing data structures. ldb and lcc execute separately, but ldb depends on and uses existing compiler function as much as possible. For example, ldb uses a variant of lcc as an ``expression server,'' which implements assignment and expression evaluation by translating C to PostScript. Making modest demands on the compiler simplifies the debugger substantially.
ldb's design embodies engineering choices that
minimize and isolate machine-dependent code.
For example,
it controls target processes with a small
``debug nub'' that is loaded with the target program.
It attaches to targets dynamically, exchanging messages with this nub
using a machine-independent protocol.
ldb's breakpoint implementation is largely machine-independent;
the only machine-dependent code implements control-flow
analysis, which takes about 50 lines per target.
ldb's
machine-dependent code depends only on the architecture
the target program runs on, not on the architecture
ldb runs on.
As a result, cross-architecture debugging is identical
to single-architecture debugging, and ldb can change
architectures dynamically.
Machine code
The New
Jersey Machine-Code Toolkit helps programmers write
applications that
process machine code, like
code generators, linkers, profilers, and debuggers.
It turns symbolic manipulations
of instructions into bit manipulations,
guided by a specification that maps between symbolic and
binary representations of instructions.
Without the toolkit,
application writers must either work with text and use native assemblers and
disassemblers, or else implement
encoding and decoding by hand, using
different ad hoc techniques for different architectures.
The toolkit automates encoding and decoding, a single technique that
works on multiple architectures.
The toolkit's specification language is simple, and it is designed to resemble instruction descriptions found in architecture manuals. To guarantee consistency, it uses a single, bidirectional construct to describe both encoding and decoding. The toolkit checks specifications for unused constructs, underspecified instructions, and inconsistencies. An instruction set can be specified with modest effort; our MIPS, SPARC, and Intel 486 specifications are 127, 193, and 460 lines.
The toolkit has been used to reduce retargeting effort in two applications. ldb uses the toolkit for its MIPS disassembler, for which it needs less than 100 lines of machine-dependent code. The toolkit supports relocation as well as encoding; a retargetable linker that uses the toolkit does relocation in only 20 lines of code, all of which are machine-independent. A previous version required 450 lines of encoding and relocation code for the MIPS alone.
The toolkit provides other practical benefits. By hiding shift and mask operations, by replacing case statements with matching statements, and by checking specifications for consistency, the toolkit reduces the possibility of error. The toolkit can speed up applications that would otherwise have to generate assembly language instead of binary code. For example, the linker mentioned above emits executable files 1.7 to 2 times faster by using the toolkit to write machine code instead of writing assembly language and using native assemblers.