Contents

MIPS instruction-set specification

This section is a complete specification of the MIPS instruction set. The MIPS is a good machine to start with because it has a small, regular instruction set. Its specification requires only the toolkit's basic features, and the regularity of the instruction set lends itself to factoring, making the specification small. As in the MIPS manual [cite kane:mips], the specification is divided into two parts: one describing the instructions of the main CPU, the other describing those of the floating-point coprocessor.

Tokens and fields

The MIPS is a RISC machine; all instructions occupy exactly one 32-bit token. Although the instructions of the floating-point coprocessor are listed separately in the manual, they use some of the same fields as the instructions of the main CPU. This property makes it possible to define all the fields in a single token class. We use noweb to distribute the field specifications, putting most in the CPU section and the rest in the coprocessor section.

fields specifications give only the positions of fields. fieldinfo specifications give more information; for example, we use special names to refer to the values of fields that denote registers. fieldinfo specifications are never required.

The core MIPS specification describes the fields first, then patterns and constructors. We tell noweb to put it in a file called mips-core.spec.

<mips-core.spec>=
fields of instruction (32) <field specifications>
<fieldinfo specifications>
<pattern and constructor specifications>

The MIPS CPU

A field is specified by giving a range of bit positions. Some of the MIPS fields are listed on page A-3 of the MIPS architecture manual; others, like breakcode, apply to only a couple of instructions. The MIPS manual sometimes uses more than one name for the same field, for example, offset and base are used instead of immed and rs to describe load and store instructions. The numbers in the specification are the starting and ending bit positions, where 0 is the least and 31 the most significant bit.

<field specifications>= (U->) [D->]
word 0:31 op 26:31 rs 21:25 rt 16:20 immed 0:15 offset 0:15 base 21:25 
target 0:25 rd 11:15 shamt 6:10 funct 0:5 cond 16:20 breakcode 6:25

Special names are associated with the integer registers. These names can be used in constructor applications, as shown in the definitions of synthetic instructions in Section [->]. These names can also be used to generate symbolic disassemblers, and ``encoding'' procedures that emit assembly language. On the MIPS, we use names of the form rn, except for the stack pointer (sp).

<properties of integer-register fields>= (U->)
names [ r0  r1  r2  r3  r4  r5  r6  r7  r8  r9  r10 r11 r12 r13 r14 r15
        r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 sp  r30 r31 ]

These names are just one of the properties that could be associated with the integer registers, which are rs, rd, rt, and base.

<fieldinfo specifications>= (U->) [D->]
fieldinfo [ rs rt rd base ] is [ <properties of integer-register fields> ]

Opcodes

An opcode is a pattern that constrains the values of one or more fields. Defining opcodes individually would be tedious, and the result would be hard to compare with the architecture manual, which uses opcode tables. The patterns declaration binds a table of names when a generating expression appears on the right-hand side, letting us write declarations that resemble the tables in the MIPS manual.

The numeric codes for the MIPS opcodes are described in three tables on page A-87 of the MIPS architecture manual. These tables are reproduced in Figure [->]. Normal opcodes are six bits, and they appear in the op field of the instruction. This statement names patterns for each of the opcodes in the table at the top of Figure [->].

<pattern and constructor specifications>= (U->) [D->]
patterns
 [ special bcond   j       jal     beq     bne     blez    bgtz
   addi    addiu   slti    sltiu   andi    ori     xori    lui
   cop0    cop1    cop2    cop3    _       _       _       _
   _       _       _       _       _       _       _       _
   lb      lh      lwl     lw      lbu     lhu     lwr     _
   sb      sh      swl     sw      _       _       swr     _
   lwc0    lwc1    lwc2    lwc3    _       _       _       _
   swc0    swc1    swc2    swc3    _       _       _       _ ] 
 is op = {0 to 63}

This statement creates names for patterns constraining the op field. The expression ``op = {0 to 63}'' generates a list of 64 patterns, ranging from ``op = 0'' to ``op = 63''. Each pattern is associated with the corresponding name in the list to the left-hand side of is. For example, an instruction matches the pattern beq if its op field is 4. Patterns associated with the name ``_'' are discarded.


--- ------ --- to 5em# --- |#
--- --- 28..26 --- Opcode --- --- ---
31..29 --- --- 0 --- --- 1 --- --- 2 --- --- 3 --- --- 4 --- --- 5 --- --- 6 --- --- 7 ---
--- --- 2..0 --- SPECIAL --- --- ---
5..3 --- --- 0 --- --- 1 --- --- 2 --- --- 3 --- --- 4 --- --- 5 --- --- 6 --- --- 7 ---
--- --- 18..16 --- BCOND --- --- ---
20..19 --- --- 0 --- --- 1 --- --- 2 --- --- 3 --- --- 4 --- --- 5 --- --- 6 --- --- 7 ---
--- ---
--- ---
--- ---
--- --- --- ---
Opcode tables from MIPS architecture manual. [*]

Two opcodes, special and bcond, are used for several instructions. These instructions are decoded by checking the bit-pattern in the funct and cond fields of the instructions, respectively. The following statement names patterns for each of the special opcodes in the table in the center of Figure [<-]:

<pattern and constructor specifications>+= (U->) [<-D->]
patterns
 [ sll     _       srl     sra     sllv    _       srlv    srav
   jr      jalr    _       _       syscall break   _       _
   mfhi    mthi    mflo    mtlo    _       _       _       _
   mult    multu   div     divu    _       _       _       _
   add     addu    sub     subu    and     or      xor     nor
   _       _       slt     sltu    _       _       _       _ ] 
 is  special & funct = {0 to 47}

The bcond table at the bottom of Figure [<-] is sparse. Instead of writing a table with many empty entries, we use another form of generating expression on the right-hand side; the list of integers in square brackets generates each integer in turn.

<pattern and constructor specifications>+= (U->) [<-D->]
patterns
 [ bltz bgez bltzal bgezal ] is bcond & cond = [ 0 1 16 17 ]

These statements create new patterns by adding further constraints to the existing patterns special and bcond.

A slightly different way to look at this style of specification is that ``special'' and ``bcond'' are the names of tables, not opcodes. As we'll see, that view is a better one for the SPARC.

Grouping instructions

The rest of the patterns organize instructions into groups, much as is done in Chapter 3 of the MIPS manual. Most instructions are grouped by assembly-language syntax. Immediate-mode instructions are grouped into signed and unsigned variants (immedS and immedU), depending on whether the immediate operand is sign-extended. The toolkit uses disjunction to define patterns that match any of a group of related instructions.

<pattern and constructor specifications>+= (U->) [<-D->]
patterns
  load    is lb | lbu | lh | lhu | lw | lwl | lwr | sb | sh | sw | swl | swr 
  immedS  is addi | addiu | slti | sltiu
  immedU  is andi | ori | xori 
  arith3  is add | addu | sub | subu | slt | sltu | and | or | xor | nor 
  shift   is sll | srl | sra
  vshift  is sllv | srlv | srav 
  arith2  is mult | multu | div | divu
  mfhilo  is mfhi | mflo 
  mthilo  is mthi | mtlo
  jump    is j  | jal
  jumpr   is jr | jalr
  branch1 is blez | bgtz | bltz | bgez | bltzal | bgezal
  branch2 is beq | bne
  copls   is lwc0 | lwc1 | lwc2 | lwc3 | swc0 | swc1 | swc2 | swc3

Constructors for simple instructions

The patterns that group instructions can now be used to define constructors. A constructor specification contains an opcode, operands, possibly a set of equations, and an output pattern. The operands correspond to the procedure's arguments, and when the procedure is called, it emits the sequence of tokens defined by the output pattern. The toolkit generates an encoding procedure for each constructor. When the constructor name on the left-hand side is the name of a pattern, each disjunct of the pattern is used to generate a constructor. For example, the first declaration below generates a constructor for each disjunct bound to the pattern load, i.e., lb, lbu, lh, etc.

Most output patterns for the MIPS constructors are conjunctions of the opcode and all the operands. For example, the load instructions could be described by

load rt, offset!(base) is load & rt & offset & base.
Because this idiom is so common, not only on the MIPS but on other machines as well, the toolkit provides a special abbreviation for it: if the output pattern is omitted, the conjunction is inferred. We can use this trick to specify almost all of the integer instructions; only the one- and two-operand branches (branch1 and branch2) and the jumps need to be treated specially. Given the groupings we defined above, we specify 56 instructions in just 15 lines: [This is the one place in specifications where newlines can be significant, preventing the opcode of a following pattern be taken as an operand of the current pattern. The reference manual explains the lexical details.]

<pattern and constructor specifications>+= (U->) [<-D->]
constructors
  load   rt, offset!(base)
  immedS rt, rs, offset!
  immedU rt, rs, offset
  lui    rt, offset
  arith3 rd, rs, rt
  shift  rd, rt, shamt
  vshift rd, rt, rs
  arith2 rs, rt
  mfhilo rd
  mthilo rs
  syscall
  break  breakcode
  copls  ft, offset!(base)
  jr     rs
  jalr   rd, rs

Operands with a trailing ! (e.g., offset!) are treated as signed parameters in encoding procedures; the toolkit implicitly narrows their values to produce unsigned fields.

These constructor specifications include the assembly-language syntax associated with the instructions being specified. For example, most operands are separated by commas, but the syntax offset(base) is used to suggest the address computation in the load and store instructions. Using assembly syntax in the constructor specification makes the specification look more like the architecture manual. We use the assembly syntax to produce symbolic disassemblers and ``encoding'' procedures that emit assembly language.

Branches and relocatable addresses

Because it's a RISC machine, the MIPS can't use target addresses directly as field values in branch instructions; a 32-bit address wouldn't leave room for an opcode, or anything else. Instead, the branches are relative to the program counter; the field corresponding to the target address records the difference between that address and the PC. We use equations to specify the relationships between the operands and fields.

<pattern and constructor specifications>+= (U->) [<-D->]
<placeholder definition>
relocatable reloc
constructors
  branch1 rs, reloc
     { reloc = L + 4 * offset! } is branch1 & rs & offset; L: epsilon
  branch2 rs, rt, reloc
     { reloc = L + 4 * offset! } is branch2 & rs & rt & offset; L: epsilon

The branch equations capture the semantics of the MIPS branch instructions; the offset is sign-extended, shifted left 2 bits, and added to the address of the delay slot, i.e., L.

Constructor operands designated relocatable are relocatable addresses, like the branch targets named reloc above. Pattern labels, like L above, are also relocatable addresses. When a constructor that uses relocatable addresses is applied, it checks to see if those addresses are known (i.e., they have been assigned absolute adresses). If so, it treats them as ordinary integers and emits the instruction. Otherwise, it emits placeholder tokens and creates a relocation closure. The placeholders will be overwritten later, when the address becomes known and the application applies the closure.

We have to specify a placeholder pattern for tokens of each class. On the MIPS, we have only one class, so we need one placeholder. We've picked a break instruction with code 99, so that we'll get a trap if by some mischance the placeholder sneaks into a real program and we try to execute it.

<placeholder definition>= (<-U)
placeholder for instruction is break(99)

The syntax ``break(99)'' isn't a special thing we use for placeholders; it's just the pattern we get by applying the break constructor to the value 99.

The PC-relative branches can't jump anywhere in the MIPS address space, only within 2^16 words of the PC. The MIPS provides a jump instruction that lets a program jump to any word in the same quarter of the address space as the PC. It's best specified using a lot of bit operations; the toolkit provides a notation for taking bit slices of values. For example, reloc@[28:31] denotes the four most significant bits of the 32-bit, relocatable address reloc. The description of j on page A-31 in the MIPS manual corresponds to the equations below.

<pattern and constructor specifications>+= (U->) [<-D->]
constructors
  jump reloc { reloc@[28:31] = L@[28:31], 
               reloc@[2:27]  = target, 
               reloc@[0:1]   = 0 } 
         is L: jump & target

The high bits of the target address must match the current address, the middle bits give the target field, and the low bits must be zero.

Synthetic instructions and conditional assembly

[*] We have completed our description of the MIPS integer instructions, but MIPS applications often use ``synthetic'' instructions, i.e., instructions that are in the assembly language but not in the hardware instruction set. We specify constructors for synthetic instructions by defining them in terms of existing constructors. We put synthetic instructions in a separate specification, because not all applications need synthetics.

<mips-synth.spec>= [D->]
constructors
  nop           is  sll (r0, r0, 0)
  mov rd, rs    is  addu(rd, rs, r0)
  b reloc       is  beq (r0, r0, reloc)

The earlier fieldinfo specification permits us to use names, not numbers, to refer to register values.

There is a large collection of synthetic branch instructions; they combine comparisons and branches. The result of comparisons is left in register 1, which MIPS conventions reserve for the assembler.

<mips-synth.spec>+= [<-D->]
constructors
  bge  rs, rt, reloc  is  slt (r1, rs, rt);  beq(r1, r0, reloc)
  bgeu rs, rt, reloc  is  sltu(r1, rs, rt);  beq(r1, r0, reloc)
  blt  rs, rt, reloc  is  slt (r1, rs, rt);  bne(r1, r0, reloc)
  bltu rs, rt, reloc  is  sltu(r1, rs, rt);  bne(r1, r0, reloc)
  bleu rs, rt, reloc  is  sltu(r1, rt, rs);  beq(r1, r0, reloc)
  ble  rs, rt, reloc  is  slt (r1, rt, rs);  beq(r1, r0, reloc)
  bgt  rs, rt, reloc  is  slt (r1, rt, rs);  bne(r1, r0, reloc)
  bgtu rs, rt, reloc  is  sltu(r1, rt, rs);  bne(r1, r0, reloc)

A full multiplication requires fetching the result from a special register.

<mips-synth.spec>+= [<-D->]
constructors
  mul rd, rs, rt  is  multu(rs, rt); mflo(rd)

Sometimes the best expansion for a synthetic instruction depends on the values of operands. We can choose one of several expansions by putting alternatives on the right-hand side of a constructor specification, each with its own set of equations. Each application of the constructor uses the first alternative for which the equations can be solved. For example, the li (load immeditate) synthetic instruction has three ways to load a signed value imm into register rt. When imm fits in 16 bits, use an immediate mode addiu instruction where the second operand is register 0, which is always 0. When the low-order 16 bits of imm are zero, use lui to assign the high-order bits to rt; lui automatically zero's the low-order bits. In the general case, cut imm into slices and use two instructions: lui to assign the high-order bits and addiu to add in the low-order bits.

<mips-synth.spec>+= [<-D->]
constructors
  li rt, imm  
    when { imm@[16:31]! = imm@[15]! } is addiu(rt, r0, imm@[0:15]!)
    when { imm@[0:15] = 0 }           is lui  (rt, imm@[16:31])
    otherwise is lui(rt, imm@[16:31] + imm@[15]); addiu(rt, rt, imm@[0:15]!)

For the third branch, the toolkit generates an unnecessary test to see that imm@[16:31] + imm@[15] doesn't overflow 16 bits. The test isn't needed, because the only way it can happen is when everything in sight is ones, and in that case we take the first branch. I'd love to have a symbolic-algebra system smart enough to figure this out.

The floating-point coprocessor

Pages B-5 through B-7 of the MIPS manual introduce a few more field names for the convenience of specifying the floating-point instructions. The integer-register fields rd, rs, and rt could be re-used to refer to fields of floating-point instructions, but we introduce new fields fd, fs, and ft in order to have different names for the registers.

<field specifications>+= (U->) [<-D->]
ft 16:20 fs 11:15 fd 6:10 format 21:24 bit25 25:25
<fieldinfo specifications>+= (U->) [<-D->]
fieldinfo [ fs ft fd ] is [ <properties of floating-point-register fields>]
<properties of floating-point-register fields>= (<-U)
names [ f0  f1  f2  f3  f4  f5  f6  f7  f8  f9  f10 f11 f12 f13 f14 f15
        f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 ]

The format field has only three legitimate values: 0, 1, and 4. The sparse field attribute associates the names s, d, and w (signifying single, double, and word, respectively) with these values.

<fieldinfo specifications>+= (U->) [<-D]
fieldinfo format is [ sparse [ s = 0, d = 1, w = 4 ] ]

The instruction codes for Coprocessor 1 (floating point) are given on page B-28 of the MIPS manual. We use names like ``add.'' instead of ``add.fmt'' because they make it possible to form the full name of the instruction by concatenating the name of the opcode pattern and the name of the format, e.g., add.s, add.d, etc. They also distinguish the floating-point opcodes from analogous integer opcodes.

<pattern and constructor specifications>+= (U->) [<-D->]
patterns
 [ add.    sub.    mul.    div.    _       abs.    mov.    neg.   
   _       _       _       _       _       _       _       _
   _       _       _       _       _       _       _       _
   _       _       _       _       _       _       _       _
   cvt.s   cvt.d   _       _       cvt.w   _       _       _
   _       _       _       _       _       _       _       _
   c.f     c.un    c.eq    c.ueq   c.olt   c.ult   c.ole   c.ule
   c.sf    c.ngle  c.seq   c.ngl   c.lt    c.nge   c.le    c.ngt ] 
 is cop1 & funct = {0 to 63} & bit25 = 1

Specifying the branch and move instructions is more complicated, because the relevant codes span several entries in the table, and the pattern language is designed to bind one pattern to one entry. We introduce two new fields to simplify the specification.

<field specifications>+= (U->) [<-D]
cop1code 22:25 copbcode 16:16
<pattern and constructor specifications>+= (U->) [<-D->]
patterns
  [ mfc1 cfc1 mtc1 ctc1 ] is cop1 & cop1code = {0 to 3} & funct = 0
  bc1x is cop1 & (cop1code = 4 | cop1code = 6)
  bc1f is bc1x & copbcode = 0
  bc1t is bc1x & copbcode = 1

The pattern bc1x is under-constrained. The architecture manual indicates that the cop1code field for the bc1f and bc1t instructions can have the value 4 or 6. When generating encoding code for the bc1f and bc1t constructors, the toolkit warns that their output patterns are under-constrained and chooses the cop1code = 4, as described in the first disjunct for bc1x. Toolkit-generated decoding code recognizes either variant.

Like the integer opcodes, the floating-point opcodes are grouped by assembly-language syntax.

<pattern and constructor specifications>+= (U->) [<-D->]
patterns
  arith3. is add. | div. | mul. | sub.
  arith2. is abs. | mov. | neg. 
  movec1  is mfc1 | mtc1 | cfc1 | ctc1
  c.cond  is c.f  | c.un   | c.eq  | c.ueq | c.olt | c.ult | c.ole | c.ule |
             c.sf | c.ngle | c.seq | c.ngl | c.lt  | c.nge | c.le  | c.ngt  
  lsc1    is lwc1 | swc1
  convert is cvt.s | cvt.d | cvt.w

The following constructor specifications introduce the compound-opcode concatenation operator (^). Each floating-point opcode has three variants, according to whether the value of the format field is s, d, or w. We form the constructors by concatenating opcode names with format. One constructor is created for each member of the cross product of the patterns named, (e.g., add.s, add.d, add.w, div.s, div.d, etc). Constructor names can also include string literals delimited by double quotes, e.g. convert^"."^format. As with the integer instructions, most output patterns are implicit.

The floating-point arithmetic, conditional, and type-conversion instructions all require floating-point operands that are even-odd register pairs. The equations for arith3.^format, etc., below specify that the register operands are all even values. These equations have been commented out because we believe people won't want to pay the overheads of checking when generating encoding procedures, but we want them when we validate the spec, so I have arranged for the command sed 's/# {/{/' to uncomment them.

<pattern and constructor specifications>+= (U->) [<-D]
constructors
  arith3.^format fd, fs, ft # { fd = 2 * _, fs = 2 * _, ft = 2 * _ }
  arith2.^format fd, fs     # { fd = 2 * _, fs = 2 * _ }
  movec1         rt, fs     
  c.cond^"."^format  fs, ft # { fs = 2 * _, ft = 2 * _ }
  convert^"."^format fd, fs # { fd = 2 * _, fs = 2 * _ }
  "bc1f" reloc { reloc = L + 4 * offset! } is bc1f & offset; L: epsilon
  "bc1t" reloc { reloc = L + 4 * offset! } is bc1t & offset; L: epsilon

We have to quote bc1f and bc1t instead of using them directly as opcodes, because these patterns have multiple disjuncts. The constructor-specification process ``explodes'' opcodes, so it would pick just one disjunct, and we would be unable to recognize the other when decoding an application of the constructor.

There are several synthetic instructions that operate on pairs of floating point registers. l.d and s.d load and store double words, respectively. mtc.d and mfc.d move pairs of words in registers to and from the floating point co-processor.

<mips-synth.spec>+= [<-D]
constructors
  l.d ft,offset!(base) # { ft = 2 * _ } 
      is  lwc1(ft, offset!, base); lwc1(ft+1, offset!+4, base)
  s.d ft,offset!(base) # { ft = 2 * _ } 
      is  swc1(ft, offset!, base); swc1(ft+1, offset!+4, base)
  mtc1.d rt, fs  # { rt = 2 * _, fs = 2 * _ } 
      is  mtc1(rt, fs); mtc1(rt+1, fs+1)
  mfc1.d rt, fs  # { rt = 2 * _, fs = 2 * _ } 
      is  mfc1(rt, fs); mfc1(rt+1, fs+1)

Because offset is a field, it is unsigned, and so we must use offset! to get the signed offset that lwc1 and similar constructors expect.

<mips-trunc.spec>=
constructors
  trunc.w.d fs, ft, rt is 
    cfc1(rt, f31); 
    cfc1(rt, f31);   nop(); 
    ori(r1, rt, 3);
    xori(r1, r1, 2); 
    ctc1(r1, f31); 
    srlv(r0, r0, r0);
    cvt.w.d(fs, ft); nop();
    ctc1(rt, f31);   nop(); 
    mfc1(rt, fs);    nop()

Application-specific specifications for the mld linker

An application writer has the option of extending a machine's specification to include application-specific information. These extensions can simplify implementation of the application or can be used by the toolkit to generate more efficient encoding code. mld is a retargetable, optimizing linker that generates code for the MIPS or the SPARC. mld uses the toolkit to emit executable binary directly, instead of going through the assembler. This means it needs substitutes for the synthetic div, divu, rem, and remu instructions provided by the MIPS assembler.

The MIPS hardware does no checking for divide by zero or divisions that overflow; these checks have to be done in software. A break 7 instruction is used to indicate one of these errors; so mld has to generate code that does these tests. The first test just checks to see if the divisor is zero, branching around the break instruction if it isn't. If, for some reason, we can tell at compile time that the divisor is zero, we just emit a break, because that's what the MIPS assembler does.

<mld-mips.spec>= [D->]
constructors
  break7ifzero rt
    when { rt = 0 } is break(7)
    otherwise       is bne(rt, r0, L); nop(); break(7); L: epsilon

The other test, for overflow, is more complicated. On a two's-complement machine, integer division can overflow if it is signed, if the divisor is -1, and if the dividend is the most negative integer---there aren't enough bits to represent the result.

<mld-mips.spec>+= [<-D->]
constructors
  break7ifoverflow rs, rt is
        addiu(r1, r0, -1);
        bne(rt, r1, L);   
        lui(r1, 0x8000);  
        bne(rs, r1, L);   
        nop();            
        break(7);
    L : epsilon

Given these two tests, we can define safe versions of signed and unsigned division and remainder.

<mld-mips.spec>+= [<-D->]
constructors
  tested_divu rd, rs, rt
    is divu(rs, rt); nop(); break7ifzero(rt); mflo(rd); nop()
  tested_div rd, rs, rt
    is div(rs, rt);  nop(); break7ifzero(rt); break7ifoverflow(rs, rt); mflo(rd); nop()
  tested_remu rd, rs, rt
    is divu(rs, rt); nop(); break7ifzero(rt); mfhi(rd); nop()
  tested_rem rd, rs, rt
    is div(rs, rt);  nop(); break7ifzero(rt); break7ifoverflow(rs, rt); mfhi(rd); nop()

Applications can also play games to trade safety for efficiency. Toolkit-generated encoding procedures prevent the application from putting a value in a field that is too small for it, so for example they won't let an application put 255 in a 5-bit field. If an application writer wants to promise that the values of a field will always fit, it can call the field ``unchecked,'' and the toolkit will omit the check, but it will still mask out the high bits. If the application can guarantee the high bits are zero, it can call a field ``guaranteed,'' and the toolkit won't even do the masking. Many applications, including mld, include register allocators that always assign proper values to the fields denoting registers:

<mips-regs.spec>=
fieldinfo [ rs rt rd base fs ft fd ] is [ guaranteed ]

mld emits relocatable addresses into the data segments. There is an argument for putting an address-emitting procedure into the toolkit library, but then that procedure would have to deal with all the details of generating a closure and so on when the value of the address wasn't yet known. Since the toolkit's generator already takes care of those details, it is much easier simply to specify a constructor that emits an address. We create a new token so we can distinguish a 32-bit data token from a 32-bit instruction token, and we choose a placeholder that is known to be an illegal address.

<mld-mips.spec>+= [<-D]
fields of addrtoken (32) addr32 0:31
placeholder for addrtoken is addr32 = 7
constructors
  emit_raddr reloc is addr32 = reloc

Specifications to support validation

We have validated this specification against the DEC-Ultrix MIPS assembler. To do that, we had to alter a little bit of syntax, and we had to discard some constructors that this assembler didn't know about.

<mips-names.spec>=
assembly opcode
  jr                is j
  jalr              is jal
  mov               is move
  {sll,srl,sra}v    is $1
  tested_{*}        is $1
assembly operand
  [ rs rt rd base ] is "$%d"
  [ fs ft fd ]      is "$%s"

The machine instructions not recognized by the MIPS assembler are discarded.

<mips-check.spec>=
discard 
  break lwc0 swc0 arith3.^"w" arith2.^"w" c.cond^"."^"w" cvt.s.s
  cvt.w.w cvt.d.d swc2 lwc2 swc3 lwc3 

On the MIPS, the checker's assembly output must include the noreorder directive.

<mips-checker.s>=
.set noreorder