With only 14 instructions, the Universal Machine is a Spartan environment for even the most seasoned assembly-language programmer. The Universal Machine Macro Assembler, called umasm, is a front end that extends the Universal Machine to create a more usable assembly language. [In assembly-language jargon, a ``macro'' is something that appears to be a single instruction but actually expands to a \emph{sequence} of machine instructions. A~true macro assembler would let you, the programmer, define your own macros. Maybe next year.]
.temps r6, r7in most of my code, which means that the assembler may destroy the contents of register r6 or r7 at any time.
.zero r0This declaration constitutes a promise to the assembler; although register zero is indeed initially zero, if you overwrite it you have to put it back to zero before claiming it stays zero. The advantage is that the Macro Assembler, relying on r0 being zero, can implement goto using a single Load Program instruction.
It is also possible to turn this feature off and to use register zero as a temporary register, e.g.,
.zero off .temps r0, r6, r7There is a minor performance penalty: to implement a goto, the Macro Assembler must now load zero into a register.
<comment> ::= from # or // to end-of-line <reserved> ::= if | m | goto | map | segment | nand | xor | string | unmap | input | output | in | program | using | off | here | halt | words | push | pop | on | off | stack <ident> ::= identifier as in C, except <reserved> or <reg> <label> ::= <ident> <reg> ::= rNN, where NN is any decimal number <k> ::= <hex-literal> | <decimal-literal> | <character-literal> <lvalue> ::= <reg> | m[<reg>][<rvalue>] <rvalue> ::= <reg> | m[<reg>][<rvalue>] | <k> | <label> | <label> + <k> | <label> - <k> <relop> ::= != | == | <s | >s | <=s | >=s <binop> ::= + | - | * | / | nand | & | '|' | xor | mod <unop> ::= - | ~ <instr> ::= <lvalue> := <rvalue> | <lvalue> := <rvalue> <binop> <rvalue> | <lvalue> := <unop> <rvalue> | <lvalue> := input() | <lvalue> := map segment (<rvalue> words) | <lvalue> := map segment (string <string-literal>) | unmap m[<reg>] | output <rvalue> | output <string-literal> | goto <rvalue> [linking <reg>] | if (<rvalue> <relop> <rvalue>) goto <rvalue> | if (<rvalue> <relop> <rvalue>) <lvalue> := <rvalue> | push <rvalue> on stack <reg> | pop [<lvalue> off] stack <reg> | halt | goto *<reg> in program m[<reg>] <directive> ::= .section <ident> | .data <label> [(+|-) <k>] | .data <k> | .space <k> | .string <string-literal> | .zero <reg> | .zero off // identify zero register | .temps <reg> {, <reg>} | .temps off // temporary regs <line> ::= {<label>:} [<instr> [using <reg> {, <reg>}] | <directive>] <program> ::= {<line> (<comment> | newline | ;)}Grammar for the Universal Machine Macro Assembler [*]
Figure [<-] on page [<-] gives the full language accepted by the Macro Assembler; the start symbol of the grammar is <program>, at the bottom. The nonterminals of major interest are <instr> and <directive>:
To use the stack instructions, you must choose a register to serve as stack pointer, allocate space in segment zero for a stack, and set the stack pointer to point to the space you have allocated.
Usage of the Macro Assembler is straightforward:
umasm [-help] [-grammar] [-o out.um] [source.ums ...]The -help option prints a longer explanation of options, including several options that are intended only for debugging the Macro Assembler itself. The -grammar option prints the input language of the Macro Assembler. The -o option names a file to which the binary UM code should be written; if not given, the Assembler writes to standard output.
The Macro Assembler is written in a combination of C (700 lines) and Lua (800 lines) stuck together with about 500 lines of glue code. In addition to Hanson's CII library, it also uses the LPEG parsing library implemented by Roberto Ierusalimschy. All that code is just the outer shell; without your core assembler of roughly 300–350 lines of C, it doesn't do anything. To create a Universal Machine Macro Assembler, you need to link your code against these four libraries:
... -lumasm -llua5.1-lpeg -llua `pkg-config --libs cii40`At that point you can start testing.
I have unit-tested almost all of the Macro Assembler instructions, with special emphasis on the conditional gotos. [The new conditional moves have \emph{not} been tested.] My testing has found enough bugs that I am convinced there are many more bugs lurking. Ideally I would create a program to generate a full test suite using randomized inputs.
To begin testing your own assembler, use the output of umdump -bare. Given any binary foo.um, you can try the following sequence of commands:
umdump -bare foo.um | tee foo.dump | ./umasm > new-foo.um umdump -bare new-foo.um | diff foo.dump -If your assembler handles the basic instructions correctly, the diff command should show no differences. Of course, this test addresses only the basic assembly of instructions.