The Universal Machine Macro Assembler

Norman Ramsey
 

Introduction

With only 14 instructions, the Universal Machine is a Spartan environment for even the most seasoned assembly-language programmer. The Universal Machine Macro Assembler, called umasm, is a front end that extends the Universal Machine to create a more usable assembly language. [In assembly-language jargon, a ``macro'' is something that appears to be a single instruction but actually expands to a \emph{sequence} of machine instructions. A~true macro assembler would let you, the programmer, define your own macros. Maybe next year.]


  <comment> ::= from # or // to end-of-line
 <reserved> ::= if | m | goto | map | segment | nand | xor | string
             |  unmap | input | output | in | program | using
             |  off | here | halt | words | push | pop | on | off | stack
    <ident> ::= identifier as in C, except <reserved> or <reg>
    <label> ::= <ident>
      <reg> ::= rNN, where NN is any decimal number
        <k> ::= <hex-literal> | <decimal-literal> | <character-literal>
   <lvalue> ::= <reg> | m[<reg>][<rvalue>]
   <rvalue> ::= <reg> | m[<reg>][<rvalue>]
             |  <k> | <label> | <label> + <k> | <label> - <k>
    <relop> ::= != | == | <s | >s | <=s | >=s
    <binop> ::= + | - | * | / | nand | & | '|' | xor | mod
     <unop> ::= - | ~
    <instr> ::= <lvalue> := <rvalue>
             |  <lvalue> := <rvalue> <binop> <rvalue>
             |  <lvalue> := <unop> <rvalue>
             |  <lvalue> := input()
             |  <lvalue> := map segment (<rvalue> words)
             |  <lvalue> := map segment (string <string-literal>)
             |  unmap m[<reg>]
             |  output <rvalue>
             |  output <string-literal>
             |  goto <rvalue> [linking <reg>]
             |  if (<rvalue> <relop> <rvalue>) goto <rvalue>
             |  if (<rvalue> <relop> <rvalue>) <lvalue> := <rvalue>
             |  push  <rvalue> on   stack <reg>
             |  pop  [<lvalue> off] stack <reg>
             |  halt
             |  goto *<reg> in program m[<reg>]
<directive> ::= .section <ident>
             |  .data <label> [(+|-) <k>]
             |  .data <k>
             |  .space <k>
             |  .string <string-literal>
             |  .zero <reg> | .zero off              // identify zero register
             |  .temps <reg> {, <reg>} | .temps off  // temporary regs
     <line> ::= {<label>:} [<instr> [using <reg> {, <reg>}] | <directive>]
  <program> ::= {<line> (<comment> | newline | ;)}
Grammar for the Universal Machine Macro Assembler [*]

Notable features of the Macro Assembler

Figure [<-] on page [<-] gives the full language accepted by the Macro Assembler; the start symbol of the grammar is <program>, at the bottom. The nonterminals of major interest are <instr> and <directive>:

Using the Macro Assembler

Usage of the Macro Assembler is straightforward:

  umasm [-help] [-grammar] [-o out.um] [source.ums ...]
The -help option prints a longer explanation of options, including several options that are intended only for debugging the Macro Assembler itself. The -grammar option prints the input language of the Macro Assembler. The -o option names a file to which the binary UM code should be written; if not given, the Assembler writes to standard output.

Building the Macro Assembler

The Macro Assembler is written in a combination of C (700 lines) and Lua (800 lines) stuck together with about 500 lines of glue code. In addition to Hanson's CII library, it also uses the LPEG parsing library implemented by Roberto Ierusalimschy. All that code is just the outer shell; without your core assembler of roughly 300–350 lines of C, it doesn't do anything. To create a Universal Machine Macro Assembler, you need to link your code against these four libraries:

   ... -lumasm -llua5.1-lpeg -llua `pkg-config --libs cii40`
At that point you can start testing.

Testing the Macro Assembler

I have unit-tested almost all of the Macro Assembler instructions, with special emphasis on the conditional gotos. [The new conditional moves have \emph{not} been tested.] My testing has found enough bugs that I am convinced there are many more bugs lurking. Ideally I would create a program to generate a full test suite using randomized inputs.

To begin testing your own assembler, use the output of umdump -bare. Given any binary foo.um, you can try the following sequence of commands:

  umdump -bare foo.um | tee foo.dump | ./umasm > new-foo.um
  umdump -bare new-foo.um | diff foo.dump -
If your assembler handles the basic instructions correctly, the diff command should show no differences. Of course, this test addresses only the basic assembly of instructions. To test these operators, you will have to devise your own .ums files.