How to port ldb

Norman Ramsey
Summer 1991

Basic Assumptions

ldb assumes that its target is a 32-bit, two's complement machine. This assumption is not pervasive; it can be changed provided the following properties still hold: These assumptions are patently false in the floating-point domain, unless both host and target happen to be IEEE floating-point machines. The integer assumptions are less problematic, as two's-complement representation is nearly universal.

One could consider a revision of the treatment of floating point values, including a plan for sending floating-point arithmetic to the target when necessary. One might also extend the PostScript interpreter to deal with Machine.Value directly, eliminating the need to consider the other PostScript types.

The ldb source

If the root of the ldb directory hierarchy is ldb, the notable subdirectories are:
ldb/doc --- Documentation
ldb/lcc --- The lcc compiler, modified for use with ldb
ldb/lib --- PostScript code to support C, symbol tables, and debugger startup
ldb/prose --- Notes to myself (breeding ground for papers)
ldb/src --- The source hierarchy
ldb/tools --- Supporting tools

Building ldb, requires noweb, a literate programming tool, and mk, Andrew Hume's replacement for make. noweb lives in /import/misc/noweb, and you officially don't have and have never heard of mk.

Porting ldb requires some understanding of the code. At minimum, read ``Important abstractions in ldb.'' ldb/src/abstracts.dvi contains a summary of the source code; you can make a new one with mk. Here's how the source is organized:

ldb/src/bin --- Binaries (object code) and other derived objects
ldb/src/cdb --- Machine-independent code for a C debugger
ldb/src/debugnub --- The debug nub, nominally machine-independent,
--- but actually loaded with #ifdefs
ldb/src/interp --- PostScript interpreter
ldb/src/mips --- MIPS-dependent code
ldb/src/sparc --- SPARC-dependent code
ldb/src/startup --- Modified startup code (crt0.s)
ldb/src/test --- Miscellany, including some test programs
ldb/src/vax --- VAX-dependent code
I often refer to files by giving the path relative to ldb/src; e.g. ``cdb/Frame.nw.''

Every target requires at least four files. If the name of the target is Mix[When talking about an arbitary target architecture, I call it ``Mix,'' in honor of the world's first polyunsaturated computer.], then the four files are:

mix/MixConfig.nw --- Interface and module for defining and installing an Architecture.T for Mix.
mix/MixFrame.nw --- Interface and module for finding and walking the Mix call stack.
mix/MixNub.nw --- Mix-dependent C code for the debug nub.
startup/Mixcrt0.s --- Assembly source for the modified runtime startup code.
The sections below describe how to create these four files, as well as what else has to be done to port ldb.

ldb's runtime support

There are two library directories mentioned in bin/mkfile: PSDIR and NUBDIR. PSDIR is ldb/lib, which holds ldb's supporting PostScript code (see page [->]). NUBDIR is a directory you create to hold object files and symbol tables; the lcc driver will look there for the debug nub and startup code (see page [->]). Only one PSDIR is needed, but you need a different NUBDIR for each target architecture you intend to support.

Porting lcc

When debugging is called for the compiler handles two tasks: emitting a symbol table for each compilation unit, and linking full programs with a debug nub. The symbol table format is described elsewhere, in a separate document. There is no documentation anywhere of the sneaky PostScript code (in newstab.ps) that makes it easier for a compiler to emit a correct symbol table.

The compiler

Refer to newstab.c plus PostScript support.

The driver

Don't forget to add the architecture to the common loader, ldb-ld!

(The text that follows is adapted from the installation guide for lcc.)

The preprocessor, compiler, assembler, and loader are invoked by a driver program, lcc, which is similar to cc on most systems. It's described in the man page ldb/lcc/etc/lcc.1. The driver is built by combining the host-independent part, ldb/lcc/lcc.c, with a small host-specific part. By convention, host-specific parts are named hostname.c, where hostname is the local name for the host on which lcc is being installed. etc holds many examples. Comments in most give the details of the particular host; pick one that is closely related to your host, copy it to ldb/lcc/yourhostname.c, and edit it as described below. You should not have to edit ldb/lcc/lcc.c.

Debug your version of the driver by running it with the -v -v options, which cause it to echo the commands it would execute, but not to execute them.

Here's an altered version of ldb/lcc/gharlane.c, which we'll use as an example in describing how to edit a host-specific part. This example illustrates all of the important features.

/* big-endian MIPS running MIPS's UNIX System V at PARC */

#include <string.h>

char *cpp[] = {                      /* GNU preprocessor */
        "/import/lcc/lib/gcc-cpp", "-undef", 
        "-DLANGUAGE_C", "-D_LANGUAGE_C", "-Dmips", "-Dhost_mips", 
        "-DSYSTYPE_BSD43", "-D_SYSTYPE_BSD43", 
        "-DMIPSEB", "-D_MIPSEB", "$1", "$2", "$3", 0 };
char *com[] = { "/import/lcc/lib/rcc", "$1", "$2", "$3", 0 };
char *include[] = { "-I/import/lcc/include/ansi", 
        "-I/bsd43/usr/include", "-I/usr/include", 0 };
char *as[] = { "/usr/bin/as", "-o", "$3", "$1", "-nocpp", "$2", 0 };
char *ld[] = { "/usr/bin/ld", "-systype", "/bsd43/", "-o", "$3", "-nocount",
        "/bsd43/usr/lib/cmplrs/cc/crt1.o", "-count", "$1", "$2", "", "", "",
        "-nocount", "-lc", "/bsd43/usr/lib/cmplrs/cc/crtn.o", 0 };

/* stab[] and linkstab[] are specific to ldb */

#ifdef PICKLE
char *stab[] = { "/import/ldb/lib/interp",
                 "(yy.stab.c) readstab (", "$1", ".pkl) Pickle", 0 };
#define linkopt "-pkl"
#else
char *stab[] = { "/bin/mv", "yy.stab.c", "$1", 0 };
#define linkopt "-nopkl"
#endif

char *linkstab[] = { "/import/lcc/bin/linkstab", linkopt,
  "-architecture", "mips",
  "-o", "$3", "$1", "$2", "/import/ldb/lib/Cnub.o", 0};

int option(arg) char *arg; {
        if (strcmp(arg, "-g") == 0)
                ;
        else if (strcmp(arg, "-p") == 0) {
                ld[6] = "/bsd43/usr/lib/cmplrs/cc/mcrt1.o";
                ld[10] = "/bsd43/usr/lib/cmplrs/cc/libprof1.a";
        } else if (strcmp(arg, "-b") == 0 
                   && access("/usr/local/lib/bbexit.o", 4) == 0)
                ld[11] = "/import/lcc/lib/bbexit.o";
        else if (strcmp(arg, "-G") == 0) { /* danger: -G incompatible with -p */
                com[0] = "/import/lcc/ldb/lcc/gen2/mips/rcc";
                ld[6]  = "/import/ldb/lib/crt1.o";
                ld[12]  = "/import/ldb/lib/Cnub.o";
        } else
                return 0;
        return 1;
}

Most of the host-specific code is data that gives prototypes for the commands that invoke the preprocessor, compiler, assembler, symbol table linker, and loader. Each command prototype is an array of pointers to strings terminated with a null pointer; the first string is the full path name of the command and the others are the arguments or argument placeholders, which are described below.

The cpp array gives the command for running the preprocessor. lcc is intended to be used with an ANSI preprocessor, such as the GNU C preprocessor available from the Free Software Foundation. If the GNU preprocessor is used, it must be named gcc-cpp in order for lcc's -N option to work correctly.

Literal arguments specified in prototypes, e.g., "-Dvax" in the cpp command above, are passed to the command as given.

The strings "$1", "$2", and "$3" in prototypes are placeholders for lists of arguments that are substituted in a copy of the prototype before the command is executed. $1 is replaced by the options specified by the user; for the preprocessor, this list always contains at least -Dunix and -D__LCC__. $2 is replaced by the input files, and $3 is replaced by the output file.

Zero-length arguments after replacement are removed from the argument list before the command is invoked. So, e.g., if the preprocessor is invoked without an output file, "$3" becomes "", which is removed from the final argument list.

For example, to specify a preprocessor command prototype to invoke /bin/cpp with the options -Dvax and -Dultrix, the cpp array would be

char *cpp[] = { "/bin/cpp", "-Dvax", "-Dultrix",
        "$1", "$2", "$3", 0 };

The include array is a list of -I options that specify which directives should be searched to satisfy include directives. These directories are searched in the order given. The first directory should be the one to which the ANSI header files were copied in the lcc installation guide. The driver adds these options to cpp's arguments when it invokes the preprocessor, except when -N is specified.

com gives the command for invoking the compiler. This prototype can appear exactly as shown above, except that the command name should be edited to reflect the location of the compiler chosen in the lcc installation guide.

as gives the command for invoking the assembler. ld gives the command for invoking the loader. For the other commands, the list $2 contains a single file; for ld, $2 contains all `.o' files and libraries, and $3 is a.out, unless the -o option is specified. As suggested in the code above, ld must also specify the appropriate startup code and default libraries.

stab gives the command for invoking the symbol table pickler. This pickler converts a PostScript symbol table into a binary representation, which can be read much more quickly than the ASCII representation. Pickling is disabled on all architectures because the Modula-3 pickling software is broken.

linkstab gives the command for invoking the symbol table linker. It is run after the loader, whose output it scans for the locations of global symbol table information. It builds a ``loader table'' for an entire program, which includes the symbol tables of all the components. The options -pkl or -nopkl determine whether the loader table table is pickled. The option -proctable[*] tells linkstab to generate a generic procedure table (see page [->]); most targets use one, but the MIPS is an exception. The -architecture option gives the name of the architecture of the target machine for placement into the symbol table.[*] This name should be the name used in the startup PostScript code (see page [->]). The other arguments to linkstab are the loader options, the loader files, and the absolute pathname of the nub object code. The ldb nub installation procedure puts the debug nub in the NUBDIR given in ldb/src/bin/mkfile (see page [->])[*]; use that path name.

The option function is described below; for now, use an existing option function or one that returns 0 except when the argument is "-G". The -G option must cause the driver to change the ld[] command to use the special startup code that calls DebugNub_main (see page [->]), and to include the debug nub object in the a.out file.

After specifying the prototypes, compile the driver by

$ cd etc
$ make HOST=gharlane
where gharlane is replaced by yourhostname. Run the resulting a.out with the options -v -v to display the commands that would be executed, e.g.,
$ a.out -v -v foo.c baz.c mylib.a -lcurses
a.out version 1.6
foo.c:
/import/lcc/lib/gcc-cpp -undef -DLANGUAGE_C -D_LANGUAGE_C -Dmips 
    -Dhost_mips -DSYSTYPE_BSD43 -D_SYSTYPE_BSD43 -DMIPSEB -D_MIPSEB 
    -Dunix -D__LCC__ -v -I/import/lcc/include/ansi 
    -I/bsd43/usr/include -I/usr/include foo.c | 
    /import/lcc/lib/rcc -v - /tmp/lcc16778.s
/usr/bin/as -o foo.o -nocpp /tmp/lcc16778.s
baz.c:
/import/lcc/lib/gcc-cpp -undef -DLANGUAGE_C -D_LANGUAGE_C -Dmips 
    -Dhost_mips -DSYSTYPE_BSD43 -D_SYSTYPE_BSD43 -DMIPSEB -D_MIPSEB 
    -Dunix -D__LCC__ -v -I/import/lcc/include/ansi
    -I/bsd43/usr/include -I/usr/include baz.c |
    /import/lcc/lib/rcc -v - /tmp/lcc16778.s
/usr/bin/as -o baz.o -nocpp /tmp/lcc16778.s
/usr/bin/ld -systype /bsd43/ -o a.out -nocount /bsd43/usr/lib/cmplrs/cc/crt1.o
    -count foo.o baz.o mylib.a -lcurses -nocount -lc 
    /bsd43/usr/lib/cmplrs/cc/crtn.o
rm /tmp/lcc16778.s
Leading spaces indicate lines that have been folded manually to fit this page. Note the use of a pipeline to connect the preprocessor and compiler. lcc arranges this pipeline itself; it does not call the shell.

As the output shows, lcc places temporary files in /tmp. Alternatives can be specified by defining TEMPDIR in CFLAGS when making the driver, e.g.,

$ make CFLAGS='-DTEMPDIR=\"/usr/tmp\"' HOST=gharlane
causes lcc to place temporary files in /usr/tmp.

Once the driver is completed, install it by

$ cp a.out /usr/local/bin/lcc
where the destination is the location chosen for lcc in the lcc installation guide.

The option function is called for the options -G, -g, -p, -pg, and -b because these compiler options might also affect the loader's arguments. For these options, the driver calls option(arg) to give the host-specific code an opportunity to edit the ld prototype, if necessary. option can change ld, if necessary, and return 1 to announce its acceptance of the option. If the option is unsupported, option should return 0.

For example, in response to -G, the option function shown above changes ld[12] from "" to "$NUBDIR/Cnub.o", which causes the debug nub to be loaded. If -G is not specified, the "" argument is omitted from the ld command because it's empty.

Likewise, the -p causes option to change the name of the startup code and to use a profiling library. option should be written to support simultaneous use of -G and -pG, but it wasn't.

To support Sun's -f68881 option, the driver also passes any option beginning with -f to option.

The option -Woarg causes the driver to pass arg to option. Such options have no other effect; this mechanism is provided to support system-specific options that affect the commands executed by the driver.

To complete the driver, write an appropriate option function for your system, and make and install the driver as described above.

ldb's PostScript code

[*]

Overview

ldb uses an embedded PostScript interpreter to read symbol table information, to manipulate the target machine, and to print values. The PostScript code is found in ldb/lib, which is called PSDIR in the mkfile, as described on page [->]). Here are the interesting files in that directory:

C.ps --- Support for the printing code emitted by the C compiler, including procedures for printing variables of all the basic types and support for the various type constructors.
Formatter.ps --- PostScript version of the Modula-3 Formatter interface.
interp.rc --- Startup code for the standalone interpreter. This version includes ldb.rc.
ldb.rc --- Interpreter initialization for ldb.
newstab.ps --- Support for reading and merging symbol tables (stabs).
Pkl.ps --- PostScript version of the Modula-3 Pkl interface.

C.ps makes some assumptions about the mapping of C types to machine types. It assumes 8-bit characters, 16-bit short integers, and 32-bit integers and long integers. These are the same assumptions lcc makes, and they're not likely to change much between machines. C.ps relies on the Location.T constructors, operations from the Machine and Location interfaces, and a few special PostScript operators implemented in Modula-3. These operators are:
C.Literal --- Convert a character or string to its literal representation.
C.StringLiteral --- Fetch a C string from the target, handling addressing errors. To avoid fetching long strings, the length of the maximum string must be specified. The C.ps code keeps this length in the $StringLimit variable.
C.Fnaddress --- A dreadful hack that finds the name of a procedure associated with a given offset by calling TargetF.ProcedureAt. Of these operators, C.Literal is the only one that could conveniently be implemented in PostScript, but it would be senseless to do so because that operation is already part of the InterpPrint interface.

newstab.ps contains all the code that supports reading, checking, and merging of symbol tables. The format of symbol tables is described elsewhere. The current version is a mess, but it works, so take it as given. Note, however, the following two procedures for creating symbol table entries for registers and floating point registers:

name location --- regsymname stabdict --- Make register stab.
name location --- fregsymname stabdict --- Make floating register stab.

ldb.rc defines procedures for including files, includes the necessary files, defines all known target architectures, and sets up the interpreter. In the next section I describe how to add an architecture.

Porting the PostScript code

You shouldn't have to touch newstab.ps, Formatter.ps, or Pkl.ps. You might have to change C.ps if the mapping of C types to machine types is radically different on the MIX, but the changes should be confined to the basic types, and you should be able to make the right changes just by looking at the code.

You must add a description of the MIX architecture to ldb.rc. Following the examples you see there, create an Architecture.mix dictionary that gives the correct meaning to the machine-dependent symbols used by lcc in its symbol tables. At minumum, these should include

[Architecture.current] Local --- Used with an offset to find local variables on the stack. Note lcc's odd usage on the VAX; it uses Local for both local variables and arguments even though the offsets are with respect to the frame pointer and argument pointer, respectively.
Architecture.current --- The name of the architecture,[*] which should be the same name that the lcc driver passes to the symbol table linker (see page [<-]).
TheRegisters --- A procedure that generates a sequence of ( name,stabentry) pairs for inclusion in a symbol table dictionary, using the regsym and fregsym procedures defined in newstab.ps. This procedure defines the names by which users can directly refer to registers, if they desire---it's not used by lcc.
Add the architecture to the list of architectures in Architecture.known. You should not change the initial definition of Architecture.current; the correct initialization is
/Architecture.current (unknown) def  % initialization --- don't touch!

PostScript symbols used by lcc

This section should be moved to the symbol table document. I divide the nonstandard PostScript symbols used by lcc into two classes: utilities and printers. The printers are:
Symbol --- Defined by --- Description
cvrs2
--- Convert.nw --- Like the PostScript cvrs, but the result string is dynamically allocated and placed on the stack.
Fetch8 --- &Fetch an 8-bit integer
Fetch16 --- &Fetch a 16-bit integer
Fetch32 --- &Fetch a 32-bit integer
Freg --- --- When applied to a register number, produces a location in Location.FloatRegisters. You may want to redefine this symbol in the Architecture.mix.
ixyLocus --- --- Procedure that packs a source location into a single word.
Local --- architecture --- When applied to an offset, produces the location of the local variable with that offset.
mkdict --- --- Procedure that builds a dictionary from key-value pairs on the stack. Usage is mark key val ... mkdict. Will appear in Level 2 PostScript under another name?
procbind --- NewstabOps.nw --- A special operator that acts like PostScript bind, except it binds procedure names rather than operator names. Used by lcc to bind printing procedures for the constituent types of structures, unions, and arrays.
Put --- --- Prints a value on the output.
Reg --- --- When applied to a register number, produces a location in Location.Registers.
RememberProc --- --- Saves a procedure for later use in assembling all the procedures from a single compilation unit.
The lcc back end emits register names, which are usually numbers, into the symbol table, for example it may emit:
  /where 30 Reg
Some lcc back ends, like that for the Sparc, use names instead of numbers:
  /where i0 Reg
In this case, i0 is defined to be 24 in the Architecture.sparc. If the MIX back end use names instead of numbers, they have to be defined appropriately in Architecture.mix. If the names begin with %, you've got troubles; you'll have to modify lcc. The printers are:
Symbol --- Defined by --- Description
FLOAT
--- &Printer for float
DOUBLE --- &Printer for double
CHAR --- &Printer for char
UCHAR --- &Printer for unsigned char
SHORT --- &Printer for short
USHORT --- &Printer for unsigned short
INT --- &Printer for int
UNSIGNED --- &Printer for unsigned
CHARSTAR --- &Printer for char *
FUNSTAR --- &Printer for function pointers.
POINTER --- &Printer for all other pointers.
STRUCT --- &Printer for all structs. The field names and offsets and their printers are passed as an additional parameter by lcc.
UNION --- &Printer for unions, usually identical to that for structs.
FUNCTION --- &Printer for all function values.
ARRAY --- &Printer for arrays. The element size, number of elements, and printer for the element type are passed as extra parameters by lcc.

Creating runtime startup code

All Unix systems start execution by executing special assembly-language code called the runtime startup code, usually found in /usr/lib/crt0.o. This code finds the arguments wherever they are, puts them in the right place for main(), and calls main(). For ldb to work, the debug nub linked with users' programs needs to get control before main(). When it gets control it does two things: The easiest way to arrange for the debug nub to get control before main() is to edit the source for /usr/lib/crt0.o, replacing the call to main with a call to DebugNub_main.[*] If you prefer, you can alter the startup code to call DegubNub_Init(&argc,argv) before calling main. This plan means more work, but it prevents DebugNub_main from occupying a stack frame (And possibly confusing a user). Put the modified startup code in ldb/src/startup/Mixcrt0.s.

If the source isn't available, and you don't feel like getting an adb or dbx and reverse engineering /usr/lib/crt0.o, you can still debug with ldb if the first statement of every main program is DebugNub_Init(&argc,argv);.

The distributed mkfile assembles and installs the startup code and the debug nub at the same time. If you want to get a head start on the startup code, see the instructions on page [->].

Porting the debug nub

The debug nub must handle faults and breakpoints, establish a connection to a debugger, and implement the debug nub end of the Wire.T protocol. The interesting machine-dependent stuff occurs when a fault occurs and the nub notifies the debugger; it passes sig and code, two integers that describe the cause of the fault, and a context, a pointer to a structure describing the state of the faulty thread. On reasonable machines, this context can be the address of a Unix struct sigcontext, which is sufficient to recover the information of interest. On unreasonable machines, like the VAX, one can't recover all the registers from a struct sigcontext, and one must resort to dirty tricks, like that of Reference [cite cormack:micro].

The debug nub has too many machine-dependent #ifdef's in the ``machine-independent'' part. It could be better documented. Nevertheless, herein a short guide to the nub.

The debug nub is written in C. The machine-indepent part is in debugnub/Nub.nw, but there are two machine-dependent aspects to this code. One is getting the damned thing to compile. The other is getting a context to the debugger. For compiling, you will probably need non-ANSI include files---bin/mkfile is set up to use lcc -I/usr/include. If the MIX thinks a signal handler returns an int, not a void, you'll have to add it to the list of the machines that do ``typedef int sigreturn;''. If you're fortunate enough to be on a machine where the struct sigcontext contains enough information to reconstruct the top of the stack, you can concentrate on getting the existing nub to compile. Otherwise, you'll have to use the VAX model and suffer deeply; search for #ifdef vax and use the technique of Reference [cite cormack:micro] to recover registers. Within the nub you may want to change the LDBPATHNAME macro to be the pathname of an executable ldb on your system.

The machine-dependent part of the nub should go into MixNub.nw. At mininimum, it must define DebugNub_Pause, which is a procedure that traps, causing a Unix signal that is caught in the debug nub and passed on to the debugger. The best trap is a ``user breakpoint'' trap; most machines, including all of the sample implementations, have an instruction that causes such a trap. (The debugger treats any trap that occurs within the special procedure DebugNub_Pause as a pause. This behavior is incorrect since it includes asynchronous signals.) A call to DebugNub_Pause with argument n causes a Pause event in the debugger, with field arg := n. Eventually users may be able to do pattern matching on these argument values, to distinguish different calls to DebugNub_Pause.

The mips/MipsNub.nw contains extra goo to support the MIPS runtime procedure table, and the vax/VaxNub.nw is at least confusing and probably broken, so don't stare at it too much.

Compiling and installing the nub and startup code

Here are the steps for building the nub:[*]

  1. You must make links from src/bin to your nub and startup code:
    cd ldb/src/bin
    ln -s ../startup/Mixcrt0.s ../mix/MixNub.nw .
    
  2. Create a directory to hold the nub object file and its symbol tables, as well as the altered startup code. [*] These objects are architecture-dependent, so you will need a separate directory for each architecture. Edit the mkfile to assign the absolute pathname of the directory to the variable NUBDIR. At different sites, sensible choices of directory might be:
    NUBDIR=/usr/local/lib/ldb/mix
    NUBDIR=/import/ldb/lib/mix
    NUBDIR=$HOME/lib/ldb/mix
    
    The bin/mkfile is carefully written to use absolute pathnames for files in this directory, so that the debugger can find the nub symbol table. Make sure you use the correct absolute pathname in the compiler driver when referring to the nub (see page [<-]).
  3. Continue editing the mkfile. Put in a disambiguation rule for MixNub (near the other such rules):
    MixNub.o: MixNub.c
            lcc -G -I/usr/include -c $prereq
    
  4. Add a virtual rule for your nub. If you needed extra assembly code, as on the VAX, follow the VAX model. Make sure every file in $EXTRANUB uses an absolute pathname beginning with $NUBDIR. It you're not following the VAX model, use:
    mixnub:V:
            mk $MKFLAGS TARGET=Mix EXTRANUB= install-nub
    
    If you intend to build nubs for several different architectures, you may want to set NUBDIR for the recursive calls to mk, as the existing recipes do with TARGET.
  5. You may need a special rule to build your startup code correctly. Check the Makefile in the place you got the source for your Mixcrt0.s. If you need a special rule, put it with the special rules for the nub.
  6. Save the updated mkfile. Assuming you've installed a correct lcc, fasten your seat belt and make and install the nub and startup code: mk mixnub. If you want to walk before you run, mk Mixcrt0.o and mk TARGET=Mix Nub.o do sensible things.

    When compiling Nub.nw you will see warning messages about ``unnamed struct (or union) in prototype.'' These warnings are bogosities generated by the symbol table code; ignore them.

Porting the debugger

Read this section in conjunction with a printout of the sample implementations (MIPS and VAX) and the machine-independent support (cdb/ConfigSupport.nw and cdb/Frame.nw). Such a printout is available as ldb/src/sample.dvi.

For most machines you will need to create only two new modules: MixConfig should define and install an Architecture.T for the Mix architecture, and MixFrame should walk the Mix stack. Both these implementations depend on the structure of the context passed from the debug nub. On the MIPS, this context is a struct sigcontext, and its structure is shown in the type Context and constant Offsets declared in the MipsConfig interface. The procedure ContextLocation converts a context offset and a field name into a Location.T. In addition to context-related declarations, the MipsConfig interface also contains names for register offsets, including ``extra'' or ``virtual'' registers (about which more later).

Installing the architecture

The only requirement for the MixConfig module is that it install an Architecture.T in Architecture.known, so nothing analogous to the MipsConfig interface is necessary. You will, however, find it easier to write the implementation if you write a MixConfig interface declaring ContextLocation and defining Context and Offsets. [ Use ldb/tools/makecontext to generate declarations for Context and Offsets. You must change directory to the NUBDIR directory (where the debug nub symbol tables are installed) before running makecontext, and you must have built the interpreter bin/interp (try ``mk interp'') and installed it on your path. You must have built a debug nub so that the appropriate symbol table can be found. If the gods are on your side, running makecontext should produce the declarations desired. Expect a snooze; it takes a long time to read the nub symbol table.] This interface is a convenient place to centralize other machine-dependent data, as the samples show.

The file cdb/ConfigSupport.nw contains machine-independent support for contexts. This file includes the implementations of ContextLocation and of the object type FieldInteger, a subtype of Server.Integer used to fetch fields from contexts. The code is machine-independent, but it depends on the machine-dependent type Context. I use a literate programming trick to include this code in MipsConfig and VaxConfig. [Modula-3 generics might be used to similar effect.] The mkfile is set up to use that trick automatically with MixConfig; if you don't want to use it you'll have to write a special rule showing how to make MixConfig.m3.

MipsConfig and VaxConfig show the pattern that the MixConfig module should follow; the only changes should be in the fields of the Architecture.T. The architecture name, register counts, and locations of the stack and frame pointers should be straightforward. [The MIPS R2000 has no hardware frame pointer, but the MIPS architecture manual defines what a ``virtual frame pointer'' is. The MipsFrame module computes the value of the virtual frame pointer and makes it available as ``extra register 0.'' Local variables are addressed by offsets from the virtual frame pointer, as you can see from the PostScript code for Architecture.mips.]

The existing implementations of breakpoints work by replacing a no-op instruction with a trap instruction. They rely on lcc to place no-ops at the places in the instruction stream that represent execution points, and they assume that a no-op and a trap can be made the same size. Such an implementation can be created by passing a NopBreakpoint.Specification to NopBreakpoint.New. The specification includes the Machine.Type used to fetch and store the instructions, the size of the instructions, and the bit patterns that represent the instructions. The sizes as well as the bit patterns differ among the samples. One can discover the bit patterns by checking in an architecture manual, but it may be easier to assemble the appropriate instructions and peek at the resulting object code with adb or dbx, where available. This trick is especially useful on machines like the MIPS, which has no instruction reserved for no-op; the instruction used is a matter of convention. One some machines, like the MIPS, the trap instruction requires an argument (typically an immediate constant); take care to select the argument that indicates a ``user breakpoint.''

The Server.Implementation gets information from the debug nub and transforms it into an Event.T, as well as holding the thread of control for later resumption. From the debug nub it gets two integers, sig and code, that describe the nature of the event, as well as the address of a context (the same described by MixConfig.Context). It needs to be able to perform the following machine-dependent operations, whose semantics are defined by a Server.Specification:

The ``procedure table'' is the closest thing ldb has to a linker interface.[*] The key operation it must support is mapping a program counter to a procedure (Symbol.Procedure). That procedure may be decorated with machine-dependent data available only at link time (or at run time). Most targets, like the VAX, will use the generic procedure table, which is automatically included in the stab by the lcc driver (provided -proctable is passed to linkstab as described on page [<-]). Other targets may require specialized procedure tables. The MIPS has no frame pointer, and the calling conventions are such that ldb needs to know how big each stack frame is, which registers are saved, and so on. I could have altered the lcc back end to record this information, but then ldb couldn't have debugged a stack containing procedures that weren't compiled with lcc -G. Since the MIPS architects provided a runtime procedure table for use by exception software and other programs that need to unwind the stack, I use it (see mips/MipsProcedureTable.nw).

Stack frame manipulation

Every architecture must supply a procedure that takes a context and returns a Frame.T. The full machine-independent part of Frame.T is exposed in the FrameRep interface and implemented in the Frame module, but that implementation lacks two methods which find the caller's frame and restore registers. Finding the caller's frame is usually straightforward; restoring registers can be more complex.

The debug nub provides access to memory only; it exports a Machine.T in which only locations in code and data segments are defined. The abstract machine associated with a frame (frame.absMach) should correspond with the abstraction provided by the compiler back end, which includes registers. These registers are found in one or more register files (a Location.Space can denote a register file), typically Location.Registers and perhaps some others. In ldb, locations in these register files are aliases for addresses in the target memory where the registers have been saved.

ldb and the debug nub must agree on a location (in the context) where registers of potential value are saved, and the debug nub must save the registers there when a fault occurs. (Operating systems save registers when a fault occurs, but they don't always make the values accessible to the debug nub's fault handler; see Reference [cite cormack:micro] for a way to recover the registers in this case.) The AliasedMachine interface is used to bind register locations to the agreed-upon locations in the target data segment; registers that are not bound to any addresses are said to be ``not live.'' Typically registers are live if and only if they are preserved across procedure calls; one hopes that in the top frame all registers will be live.

Fetching a single byte from a register is problematic; the actual address of that byte in the target depends not only on the location of the register but also on the endianness of the target. The RegisterMachine interface solves this problem by transforming all fetches from a register into fetches of complete words, then extracting bits or combining words as necessary.

Implementation of MixFrame should proceed by modifying one of the sample implementations, which I will now discuss, beginning with the VAX (the least complex).

VAX overview

The VAX frame type (VaxFrame.T) extends Frame.T with several useful fields (in addition to the require proc and pc: fp --- The frame pointer.
stackmem --- Machine representing the stack. Locations in frame.aliases are aliases for locations here.
aliases --- Forms part of frame.absmach, and holds bindings of registers for possible use by the calling frame. Any time a Frame.T is created, all its non-default fields must be given values.

All of the sample implementations have stackmem and aliases, but I didn't want to institutionalize a requirement for those fields in the FrameRep interface. They all use the same trick is to prevent register fetches from following a chain of aliases along the call stack: if a register is left untouched in a frame f, then f's caller uses the same alias for that register that f used.

The VAX uses the type RegIndex to represent a register, and the type RegSet to represent a set of registers (register save mask). Similar types should be useful on any machine that employs register save masks. It also defines a constant RegFiles for use by RegisterMachine.New, and AliasFiles and AliasSizes, for use by AliasedMachine.New. The aliased spaces are added to the basic machine from the debug nub (which supports code and data) by Machine.Join. RegFiles shows which are treated as registers. You should adapt these type and constant declarations to the MIX.

The VAX context is defined in the VAX debug nub as a pointer to 16 words containing the registers saved from the faulting thread. This decision is recorded in the declaration of VaxConfig.Context, where the 16 registers are given their customary names. VaxFrame.New, like the frame creation procedures for other targets, is required to create a frame f with f.absMach # NIL; the default value may not be used. It therefore creates an abstract machine in which the registers are aliases for locations in the context. The other fields of the frame are computed or fetched from the target in a straightforward way.

Finding the caller's frame, done by VaxFrame.SetCaller, requires creating a new frame and again setting all the nondefault values, but it is not necessary to compute the abstract machine absMach; that is done lazily by VaxFrame.SetAbsMach. SetCaller must raise an exception when it has reached the bottom of the stack. The bottom-of-stack condition on the VAX is not well documented, and this makes VaxFrame.SetCaller a bit of a crock. Otherwise, discovering the return address and frame pointer is straightforward. The return is adjusted by a constant offset to produce the ``calling pc,'' a strange abstraction intended to represent ``the address of the call.'' It is described more fully in cdb/Frame.nw. The offset, PCOffset, when substracted from the return address, should produce the address of a no-op immediately preceding the call instruction (assuming such exists). [The ``calling PC'' is probably a mistake, but it's what I have for now.] The calling PC is then adjusted to make sure it hasn't moved outside the calling procedure, and the new frame is returned.

Recovering the VAX registers requires elaborate decoding of the status word left on the stack. The format of the word is described in the VAX architecture manual, but it should be clear from the code, since I have defined variables that correspond to the important fields. The scheme of VaxFrame.SetAbsMach is followed by all of the sample implementations.

Restoring registers follows the VAX calling conventions. Registers 0 through 11 are for general use, and are saved on the stack if and only if mentioned in the register save mask (converted to the set saved). If a register in the range 0...5 is not saved, it is lost (``not live'') and not bound in frame.aliases. Registers 6...11 must be preserved across procedure calls, so if they are not saved, frame uses the same aliases used in the callee, found in prevbind. [This is the chain-avoiding trick described above] The argument pointer is found on the stack. The frame pointer and stack pointer shouldn't be changed by users, so they are bound to appropriate Raw values. Recovering the stack pointer is a bit complicated; I have duplicated the semantics of the VAX return instruction, then bound the stack pointer to the Raw result.

The function VaxFrame.MaskToSet captures the semantics of the VAX register save mask. VaxFrame.SavedOffset uses the mask to compute the offset of a given saved register. Similar functions will be useful on any target with register save masks.

MIPS overview

The MIPS differs from the VAX in one important respect: it has no frame pointer. This has two consequences for MipsFrame: it must use machine-dependent data to discover the size of stack frames and walk the stack, and it must simulate a ``virtual frame pointer'' in software (because lcc stores local variables relative to this virtual frame pointer). [I didn't do it---the virtual frame pointer is discussed in the MIPS architecture manual, Appendix D.] MipsFrame.T differs from VaxFrame.T only in that it saves a stack pointer instead of a frame pointer.

The MIPS type declarations show two new register files, the floating-point registers and the ``extra registers.'' MipsFrame.New shows the binding of these files, in addition to the things we saw in VaxFrame.New. The machine-dependent attribute "framesize" is used to compute the value of the virtual frame pointer. The MIPS context is a struct sigcontext, holding not just the registers but also additional information.

MipsFrame.SetCaller uses several pieces of machine-dependent data. This data is supplied by the loadData method of the MIPS ProcedureTable.Implementation. The return address, stored in pc, is either zero, or saved on the stack, or currently in a register. The new stack pointer is obtained by adding the frame size to the current stack pointer. The computation of frame.proc and the adjustment of the return address to determine the ``calling PC'' proceed as usual.

MipsFrame.SetAbsMach requires machine-dependent data to restore both general-purpose and floating-point registers, and to bind the ``extra registers.'' The restoration of the registers is similar to that for the VAX, although the details differ in uninteresting ways.

Compiling a new ldb

Here are the steps:
  1. Link your source files into the bin directory and mention them in the mkfile. There's a shell script that makes the link and adds interface and module to the appropriate lists in the mkfile:
    cd ldb/src/bin
    addfile MixConfig MixFrame [...]
    
  2. Edit bin/mkfile to show what interfaces and modules are needed to support the MIX architecture. With luck, all you will need will be
    MIX=MixFrame.io  MixFrame.mo \
        MixConfig.io MixConfig.mo
    
    Then, add $MIX to the list of prerequisites for the target ldb.
  3. Edit the assignment to PSDIR at the top of mkfile to show ldb where to find its PostScript files.[*] You should define PSDIR to be the absolute pathname of the ldb/lib directory in the distribution (unless you choose to move the files to a ``famous files'' directory), for example
    PSDIR=/import/ldb/lib
    

You're now ready to compile. With luck, ``mk ldb'' should get you a new ldb. ``mk -k ldb'' will proceed as far as possible in the face of errors.

References

[1] Gordon V. Cormack. A micro-kernel for concurrency in C. Software---Practice &Experience, 18(5):485--491, May 1988.