\newtoks\tyboxes
\deeperboxspace=0pt
Norman Ramsey
\titlehead\uppercaseUnparsing with prefix and postfix operators \authorheadNorman Ramsey \received29 August 1997 \revised27 March 1998 \spe00MONTH98130
\catcode`\|=\active\gdef\makevbar\catcode`\|=\active \chardef\lbr`{\chardef\rbr`}
--- \productionvbar --- \catcode`\|=\active\gdef\makeproductionvbar\catcode`\|=\active
Unparsing is the problem of transforming an internal representation of a program into an external, concrete syntax. In conjunction with prettyprinting, it is useful for generating readable programs from internal representations. If the target language uses prefix and postfix operators, the problem is nontrivial. This paper shows how to unparse expressions using a simple, bottom-up tree walk, which keeps track of the least tightly binding operator not enclosed by parentheses. During the tree walk, this operator is compared with the operator of the parent expression, and parentheses are inserted based on the precedence, associativity, and fixity (infix, prefix, or postfix) of the two operators. The paper is a literate program. It includes code for the unparser and for its inverse parser, both of which can handle operators of dynamically chosen precedence and associativity. Supporting such operators is useful for languages like ML, in which programmers may assign precedence and associativity to their own functions.
Automatic generation of computer programs from high-level specifications is a well-known technique, which has been used to create scanners, parsers, code generators, tree rewriters, packet filters, dialogue handlers, machine-code recognizers, and other tools. To make it easier to debug an application generator, and to help convince prospective users that the generated code can be trusted, the generated code should be as idiomatic and readable as possible. It is therefore desirable, and sometimes essential, that the generated code use infix, prefix, and postfix operators, and that expressions be presented without unnecessary parentheses. Another helpful technique is to use a prettyprinter to automate indentation and line breaking.[cite oppen:prettyprinting, hughes:pretty-printing] This paper presents a method of automatically parenthesizing expressions that use prefix, postfix, and implicit operators; the method is compatible with automatic prettyprinting. The paper also shows how to parse expressions that use such operators. The parsing algorithm can be used even if operator precedences are not known at compile time, so it can be used with arbitrarily many user-defined precedences.
This paper is a ``literate program,''[cite knuth:literate] which means that a single source file is used to prepare both manuscript and code. Including code provides a formal, precise, testable statement of the unparsing algorithm. The code is written in the purely functional subset of Standard ML.[cite milner:definition] ML is a good choice for several reasons. Referential transparency makes the subset easy to reason about, so we can show the correctness of simple unparsing algorithms. It is possible to express some properties of the data as type properties, which are checked by the compiler. Finally, SML's parameterized modules make the code reusable in several different contexts---the code that appears in the paper is the code used in both the SLED and the lambda-RTL compilers,[cite ramsey:specifying, ramsey:embedded] for example. The paper explains many ML constructs as they are used, but readers not familiar with ML may still wish to consult References \refnoullman:elements--\refnofelleisen:mler for an introduction.
The paper presents algorithms for two versions of the unparsing problem. In the first version, all operators are binary infix operators; this version of the unparsing problem is well understood, and it is simple enough to make a proof of correctness tractable. The invariants and insights from this proof lead to an algorithm for a more general version, which handles prefix and postfix operators, as well as non-associative infix operators of arbitrary arity. The algorithm can also handle some ``mixfix'' operators, like the array-indexing operator \cdot[\cdot]. The algorithm is the main contribution of this paper. It is presented as executable code, which uses parameterized modules to make it independent of the representation of the unparsed form.
Parsers and unparsers translate between concrete and abstract syntax. The concrete syntax used to represent expressions is usually ambiguous; ambiguities are resolved using operator precedence and parentheses. Internal representations are unambiguous but not suitable as concrete syntax. For example, expressions are often represented as simple trees, in which an expression is either an ``atom'' (constant or variable) or an operator applied to a list of expressions. We can describe such trees by the following grammar, written in extended BNF,[cite wirth:ebnf] in which nonterminals appear in \ntitalic and terminals in \termslant fonts: {production}expression \termatom | \termoperator \sequence\ntexpression {production} LISP and related languages use a nearly identical concrete syntax, simply by requiring parentheses around each application. The designers of other languages have preferred other concrete syntax, using infix binary operators, prefix and postfix unary operators, and possibly ``mixfix'' operators, as described by the following grammar: {production}expression \termatom | \ntexpression \terminfix-operator \ntexpression | \termprefix-operator \ntexpression | \ntexpression \termpostfix-operator | \ntexpression \sequence\termnary-operator \ntexpression | \ntother form of expression... {production} Even if the internal form distinguishes these kinds of operators, this grammar is unsuitable for parsing, because it is ambiguous. Grammars intended for parsing use various levels of precedence and associativity to resolve ambiguities. Once precedence and associativity have been determined, they can be encoded by introducing new nonterminals for each level of precedence. Users of shift-reduce parsers can specify precedence and associativity directly, then have the parser use them to resolve shift-reduce conflicts.
The parsing problem is to translate from concrete syntax to internal representation. The unparsing problem is to take an internal representation and to produce a concrete syntax that, when parsed, reproduces the original internal representation. That is, an unparser is correct only if it is a right inverse of its parser. For example, if our internal representation is a tree representing the product of x+y with z, we cannot simply walk the tree and emit the concrete representation x+y×z; we must emit (x+y)×z instead.
As stated, the unparsing problem does not have a unique solution; in general, there are many possible concrete syntaxes that, when parsed, reproduce the original internal representation.[*] For example, we could choose a concrete syntax that parenthesizes every expression, as in (((x)+(y))×(z)). The LISP rule parenthesizes every application, as in ((x+y)×z). Rules like these use too many parentheses. Code with extra parentheses is unidiomatic and unreadable, like this C code, which has been emitted using something like the LISP rule:
if ((((Mem.u).Index8).index) != 4) if (!((unsigned(((Mem.u).Index8).d)) < 0x100)) fail((("((Mem.u).Index8).d = 0x%x won't fit in 8 unsigned bits"), (((Mem.u).Index8).d))); else { emit((((7 | (15 << 4)) | (1 << 3)), 1)); emit((((4 | (3 << 3)) | (1 << 6)), 1)); emit((((((((Mem.u).Index8).ss) << 6) | ((((Mem.u).Index8).index) << 3)) | (((Mem.u).Index8).base)), 1)); emit((((((Mem.u).Index8).d) & 0xff), 1)); } else fail(("Conditions not satisfied for constructor CALL.Epod"));The unparser in this paper is not merely a parser's right inverse; it also produces results without extra parentheses:
if (Mem.u.Index8.index != 4) if (!((unsigned)Mem.u.Index8.d < 0x100)) fail("Mem.u.Index8.d = 0x%x won't fit in 8 unsigned bits", Mem.u.Index8.d); else { emit(7 | 15 << 4 | 1 << 3, 1); emit(4 | 3 << 3 | 1 << 6, 1); emit(Mem.u.Index8.ss << 6 | Mem.u.Index8.index << 3 | Mem.u.Index8.base, 1); emit(Mem.u.Index8.d & 0xff, 1); } else fail("Conditions not satisfied for constructor CALL.Epod");
The unparsing method uses information about precedence,
associativity,
and ``fixity'' of operators to transform an internal form into
a concrete syntax.
The method works from the bottom up and from the
inside out.
In each expression, it finds the operator, and it considers the
subexpressions and their positions, left or right, relative to the
operator.
It decides whether to parenthesize subexpressions by comparing
precedence and associativity of the current operator with the
precedences and associativities of the most loosely binding operators
in the subexpressions.
Operators that are ``covered'' by parentheses do not participate in
the comparison.
Here are four example trees:
\pssetlinewidth=0.4pt
cccc
\pstree[nodesep=2pt,levelsep=15pt]
\TR×
\pstree\TR+\TRx\TRy
\TRz
---
\pstree[nodesep=2pt,levelsep=15pt]
\TR+
\pstree\TR×\TRx\TRy
\pstree\TR+\TRz\TRw
---
\pstree[nodesep=2pt,levelsep=15pt]
\TR\dpsmashpostfix ++
\pstree\TR*
\TRp
---
\pstree[nodesep=2pt,levelsep=15pt]
\TR\dpsmashprefix ++
\pstree\TR*
\TRp
(a) --- (b) --- (c) --- (d)[*]
In tree (a), discussed above, the unparser renders the left
subtree as x+y, without parentheses, because the subexpressions
x and y contain no operators.
At the root, because operator × has higher precedence than
operator +, the unparser parenthesizes the left subexpression, and
the final result is (x+y)×z.
In tree (b), + has lower precedence than ×, so it is not
necessary to parenthesize the left subtree.
Because + is left-associative, however, it is necessary to
parenthesize the right subtree, because it is a right child of a
parent with the same precedence and left associativity.
The final result is therefore x ×y + (z + w).
Trees (c) and (d) show prefix and postfix operators as used in C.
In tree (c), ++
and *
have opposite ``fixity'' (one is postfix
and one is prefix), and because ++
has higher precedence
than *
, it is necessary to parenthesize the subexpression,
resulting in (*p)++
.
In tree (d), ++
and *
have the same fixity, so precedence is
irrelevant, and the expression is unparsed as ++*p
.
Although the full unparsing method handles prefix, postfix, and infix operators, it is easier to understand if we begin with the simpler case in which all operators are binary infix operators.
Type rator
represents a binary infix operator, which has a text
representation, a precedence, and an
associativity:
<infix>= (U->) [D->] type precedence = int datatype associativity = LEFT | RIGHT | NONASSOC type rator = string * precedence * associativity
Definesassociativity
,precedence
,rator
(links are to index).
This ML code uses simple integers (int
) to represent precedence,
an enumeration to represent associativity, and a triple to represent
an operator.
(In the context of an ML type definition, a *
does not represent
multiplication; it connects elements of a tuple.)
The more general unparser, presented below, shows how to use an
arbitrary type, not just string
, as an operator's concrete
representation.
Precedence and associativity determine how infix expressions are parsed into trees, or equivalently, how they are parenthesized. For example, if operator \otimes has higher precedence than operator \oplus, then x \oplusy \otimesz = x \oplus(y \otimesz) and x \otimesy \oplusz = (x \otimesy) \oplusz. When two operators have the same precedence, associativity is used to disambiguate. If \oplus is left-associative, x \oplusy \oplusz = (x \oplusy) \oplusz; if it is right-associative, x \oplusy \oplusz = x \oplus(y \oplusz). Some languages have non-associative operators; if \oplus is non-associative, then
(x \oplusy) \oplusz !=x \oplusy \oplusz !=x \oplus(y \oplusz),and x \oplusy \oplusz may be illegal. The comma that separates parameters in a C function call is an example of a non-associative operator.
Many expressions are obtained by applying operators to other expressions, but there must always be indivisible constituents of expressions. We call such constituents atoms. They appear at the leaves of abstract syntax trees and as the non-operator tokens in concrete syntax. In most languages the atoms include identifiers, integer and real literals, string literals, and so on. In our initial, simple model, an expression is either an atom or an infix operator applied to two expressions:
<infix>+= (U->) [<-D->] datatype ast = ATOM of string | APP of ast * rator * ast
Definesast
(links are to index).
Type ast
represents an expression's abstract syntax tree.
This ML datatype
definition is something like a production in a context-free
grammar; it gives two alternative ways of constructing an ast
.
These alternatives are always given names (ATOM
and
APP
), which are called constructors.
An ML datatype may contain any number of constructors, and there may
be data associated with each one.
Here, for an atom, we care only about the atom's string
representation.
For an application, we want the operator and the ast
s representing
the two operands.
Because ast
is used in its own definition, it is a recursive type,
and values of type ast
are trees.
Type type ast
represents the input to the unparser; we also need a
type to represent the unparser's output.
For simplicity, we treat this output as a sequence of lexemes, where a
lexeme represents an atom, an operator, or a parenthesis.
Moreover, we undertake to emit only sequences in which atoms and
operators alternate, or in which parenthesized sequences take the
place of atoms.
Let us call such a sequence an image
and use a representation that
forces it to satisfy the following grammar:
{production}image\ntlexical-atom \sequence\termoperator \ntlexical-atom{production}
{production}lexical-atom\termatom \vbar \lit( \ntimage \lit)
{production}
In the corresponding ML, the sequence in braces becomes
the type image'
:
<infix>+= (U->) [<-D->] datatype image = IMAGE of lexical_atom * image' and image' = EOI (* end of image *) | INFIX of rator * lexical_atom * image' and lexical_atom = LEX_ATOM of string | PARENS of image
Definesimage
,image'
,lexical_atom
(links are to index).
EOI
represents the end of the image, and PARENS
represents
an image in parentheses.
This representation enforces the invariants that expressions and
operators alternate,
and that the first and last elements of an image are expressions.
To be correct, an unparser must produce a sequence that parses back
into the original abstract syntax tree.
We develop an unparsing algorithm by thinking about parsing.
To understand how to minimize parentheses, we need to consider where
parentheses are needed to get the correct parse.
Suppose we have an abstract syntax tree that has been obtained by
parsing an image without
any parentheses.
Then wherever the syntax tree has an APP
node whose parent is also
an APP
node, there are two cases: the child may be the left child
or the right child of its parent:
[picture] \drawline(54,-16)(50,-26) \dashline1(56,-16)(60,-26) \qquad\qquad[picture] \dashline1(54,-16)(50,-26) \drawline(56,-16)(60,-26)Because this tree was obtained by parsing parenthesis-free syntax, either the inner operator has higher precedence than the outer, or they have the same precedence and associativity and the associativity is to the left (right) if the inner is a left (right) child of the outer. We formalize this condition using the predicate
noparens(inner, outer,
side)
, where side
is LEFT
or RIGHT
and
noparens
is defined
as follows:
\addboxnoparens : rator * rator * associativity -> bool
\deeperbox2pt
<infix>+= (U->) [<-D->] fun noparens (inner as (_, pi, ai), outer as (_, po, ao), side) = pi > po orelse pi = po andalso ai = ao andalso ao = side (*unboxval*)
Definesnoparens
(links are to index).
where pi
, ai
, po
, and ao
stand for precedence or
associativity of inner or outer operators.
This ML code defines a function noparens
.
Its name and type are given in the box at the beginning of the chunk.
Type assertions in boxes are inserted into the code and checked by the
compiler.
The type of noparens
shows that it is a Boolean function
of three arguments (two operators and an
associativity).
The structure of the definition is
fun
\termname \ntbinding =
\ntbody.
The \ntbinding associates names to the arguments; these names are
used in the body.
inner as (_, pi, ai)
is a nested binding; it binds pi
to the precedence of inner
and ai
to the associativity
of inner
.
The underscore _
is a ``wildcard,'' which indicates that the
text representation of inner
is to be ignored and not bound to any
name.
The process of using the \ntbinding to associate names with the
arguments and parts of the arguments is called ML pattern matching.
Readers familiar with the treatment of operator-precedence parsing in Section 4.6 of [cite aho:compilers] may recognize that
noparens(i, o, LEFT) --- === --- i \dotgtoWe use
noparens(i, o, RIGHT) --- === --- o \dotlti
noparens
during unparsing to decide when parentheses are
needed and during parsing to decide whether to shift or reduce.
To prove correctness of the unparser, I introduce a simple LR parser.
The unparser is correct if parse (unparse e) = e
for any syntax tree e
.
In other words, if we begin with an abstract syntax tree e
,
unparse it, and then parse the resulting lexemes, we get back the
original abstract syntax tree.
In the parser's stack, expressions and operators alternate. Unless the stack is empty, there is an operator on the top, which is written to the right.
<infix>+= (U->) [<-D->] datatype stack = BOT | PUSH of stack * ast * rator
Definesstack
(links are to index).
This recursive datatype is isomorphic to a list of (operator, ast)
pairs, but it is helpful to define a special type because the
constructors BOT
and
PUSH
distinguish parser stacks from other kinds of lists.
The parser itself can be simplified by a bit of trickery.
Instead of treating ``bottom of stack'' or ``end of input'' as special
cases, treat them as occurrences of a sentinel operator,
minrator
.
minrator
has precedence minprec
, which
must be lower than the precedence of any real
operator.
Using this trick, we define functions that return the operator on
the top of the stack and the operator about to be scanned in the input.
\addboxsrator : stack -> rator
\addboxirator : image' -> rator
\deeperbox2pt
<infix>+= (U->) [<-D->] val minrator : rator = ("<minimum-precedence sentinel>", minprec, NONASSOC) fun srator (PUSH (_, _, $)) = $ | srator BOT = minrator fun irator (INFIX ($, _, _)) = $ | irator EOI = minrator (*unboxval*)
Definesirator
,minrator
,srator
(links are to index).
Given a stack stack
and an input sequence of tokens ipts
, we
may
abbreviate srator stack
as \rators and
abbreviate irator ipts
as \ratori.
These function definitions show a new ML construct---we define
functions on datatype
s by pattern-matching against the
datatype constructors.
We provide a \ntbinding and a \ntbody for each
constructor.
This ability to use pattern matching to do case analysis is one of the
advantages of using ML, especially because the compiler ensures that
all cases are covered.
This code also illustrates an unusual syntactic feature of ML: we can use
strings of symbols, as well as strings of letters, to make identifiers.
In this paper, by convention, the identifier $
stands for an
arbitrary operator.
We now have enough machinery to write a shift-reduce LR parser,
which continues parsing until its stack
and input are empty.
The parser's state includes a stack with an
operator \rators on top, a current expression e
, and the input, which
begins with the operator \ratori.
This state is represented by three parameters to the function parse'
.
Parsing is complete when the stack is empty (BOT
) and the input is
empty (EOI
).
\addboxparse' : stack * ast * image' -> ast
\addboxparse : image -> ast
\addboxparseAtom : lexical_atom -> ast
<infix>+= (U->) [<-D->] exception Associativity local exception Impossible fun parse' (BOT, e, EOI) = e | parse' (stack, e, ipts) = <shift or reduce, depending on relative precedence of \rators and \ratori> and parse (IMAGE(a, tail)) = parse'(BOT, parseAtom a, tail) and parseAtom (LEX_ATOM a) = ATOM a | parseAtom (PARENS im) = parse im (*unboxval*) in val parse = parse end
DefinesAssociativity
,Impossible
,parse
,parse'
,parseAtom
(links are to index).
The ML code uses two new constructs, and it shows a new way to use
pattern matching.
The declaration local
\ntprivate-declarations in
\ntpublic-declarations end
hides information.
Private declarations are invisible outside the local
construct; only public declarations escape.
The connective and
is used with fun
to define a nest of mutually
recursive functions.
The first match in the definition of parse'
handles the case when
both the stack and the input are empty.
The next match handles all other cases; this works because a variable
in a \ntbinding matches any datatype constructor.
The interesting part of the parser is deciding whether to shift or
reduce.
Shifting pushes e
and \ratori onto the stack.
Reducing pops \rators and the expression
below it, creating a new
APP
node with operator \rators.
Shifting \ratori guarantees that \ratori will be reduced before
\rators, and therefore that \ratori will eventually be a right
descendant of \rators.
Reducing guarantees that \rators will eventually be a left
descendant of \ratori.
The parser uses noparens
to choose whichever alternative is
correct without parentheses.
The choice is deterministic because the conditions for shifting and
reducing are mutually
exclusive; the proof of correctness relies on that fact.
<shift or reduce, depending on relative precedence of \rators and \ratori>= (<-U) if noparens(irator ipts, srator stack, RIGHT) then (* shift *) case ipts of INFIX ($, a, ipts') => parse'(PUSH(stack, e, $), parseAtom a, ipts') | EOI => raise Impossible (* EOI has lowest precedence *) else if noparens(srator stack, irator ipts, LEFT) then (* reduce *) case stack of PUSH (stack', e', $) => parse'(stack', APP(e', $, e), ipts) | BOT => raise Impossible (* BOT has lowest precedence *) else raise Associativity (* saw a + b + c, for some nonassociative + *)
Pattern
matching can be used in case
statements as well as in function definitions.
Here it is used to extract the first two inputs or the expression and
operator on top of the stack.
Because srator
and irator
return minrator
for BOT
and
EOI
, the parser shifts when the stack is empty and
reduces when there are no more inputs.
Therefore,
the lines that raise Impossible
are never
executed; they are included only to keep the compiler from complaining
about cases that are not covered.
The parser uses two different kinds of recursion.
The direct recursive calls from parse'
to itself are all tail
calls; such calls are ML programmers' way of writing loops.
In an imperative language, parse'
would be written with a loop,
and it would modify three variables: the stack, the current
expression, and the inputs. When the stack and
inputs were empty, it would exit the loop by returning the current
expression.
An imperative version of
parse'
would keep the indirect recursive call through
parseAtom
and parse
.
Before getting into the details of unparsing, we define functions that make images from atoms and put parentheses around images. \addboximage : string -> image \addboxparenthesize : image -> image \deeperbox2pt
<infix>+= (U->) [<-D->] fun image a = IMAGE (LEX_ATOM a, EOI) fun parenthesize im = IMAGE (PARENS im, EOI) (*unboxval*)
Definesimage
,parenthesize
(links are to index).
infixImage
combines two images by putting an infix operator
between them. It uses append
as an auxiliary function.
\addboxappend : image' * image' -> image'
\addboxinfixImage : image * rator * image -> image
\deeperbox2pt
<infix>+= (U->) [<-D->] local fun append (EOI, image') = image' | append (INFIX($, a, tail), image') = INFIX($, a, append(tail, image')) in fun infixImage (IMAGE(a, tail), $, IMAGE(a', tail')) = IMAGE(a, append(tail, INFIX($, a', tail'))) (*unboxval*) end
Definesappend
,infixImage
(links are to index).
To unparse an expression, we produce a fragment containing not only the image of the expression, but also the lowest-precedence operator used in the expression. That operator tells us enough to decide whether to parenthesize that expression when it is used.
<infix>+= (U->) [<-D->] type fragment = rator * image
Definesfragment
(links are to index).
The top-level operators of an image are the
operators that appear outside of parentheses.
If an image has top-level operators, then the top-level operators
of least precedence must all have the same associativity, which must
be LEFT
or RIGHT
.
Otherwise, one of those operators would have to be in parentheses,
which contradicts the assumption that they are top-level operators.
The function bracket(frag, side, rator)
parenthesizes a fragment frag
that is to appear next to an operator rator
on the side labelled side
:
\addboxbracket : fragment * associativity * rator -> image
<infix>+= (U->) [<-D->] fun bracket((inner, image), side, outer) = if noparens(inner, outer, side) then image else parenthesize image
Definesbracket
(links are to index).
Given the ability to bracket fragments, unparsing is straightforward.
We create another sentinel operator maxrator
, having precedence maxprec
,
which must be higher than the precedence of any true operator, so we
can use it in fragments made from atoms.
\addboxunparse' : ast -> rator * image
\addboxunparse : ast -> image
\deeperbox2pt
<infix>+= (U->) [<-D] local val maxrator = ("<max-precedence sentinel>", maxprec, NONASSOC) fun unparse' (ATOM a) = (maxrator, image a) | unparse' (APP(l, $, r)) = let val l' = bracket (unparse' l, LEFT, $) val r' = bracket (unparse' r, RIGHT, $) in ($, infixImage(l', $, r')) end in fun unparse e = let val ($, im) = unparse' e in im end (*unboxval*) end
Definesmaxrator
,unparse
,unparse'
(links are to index).
Using bracket
maintains the precedence and associativity
invariants of the fragments.
unparse
first computes a fragment, then discards the operator,
leaving only the image.
Proving the simple unparser correct helps clarify how the parser and unparser work together. The result giving the most insight is Proposition [->], which gives the properties of the top-level operators produced during unparsing. Most of the proof is omitted from the body of the paper; it appears instead in Appendix [->].
\long
--- \prec\rator> \precrator
\lor --- \prec\rator= \precrator
\land\assoc\rator= \assocrator
.
{defn} This relation is likenoparens
, except it ignores the relative positions of operators of the same precedence and associativity. This weakening ofnoparens
yields a reflexive, transitive, binary relation, which is in fact a slight variation on ``has precedence at least as great.'' \stepcounterdefn\ifshortproof
The proof of Proposition [->] is by structural induction on
e
. The base case is straightforward. To get the idea of the proof of the induction step, we use textual juxtaposition to write expressions, inputs, and parser stacks. If e = l \opr, we can write unparse e=a_l t_l \opa_r t_r, where a's are atoms and t's (tails) are sequences of atoms and operators. Writing \parsers\parseAtoma_l t_l \opa_r t_r i for the state of the parser, we prove Proposition [->] by showing {proofstep*} \parsers\parseAtoma_l t_l \opa_r t_r i {proofstep*} = (by the induction hypothesis applied to l) {proofstep*} \parsersl\opa_r t_r i {proofstep*} = (by hypothesis, noparens(\op, srator s, RIGHT), and the parser shifts) {proofstep*} \parsers l \op\parseAtoma_rt_r i {proofstep*} = (by the induction hypothesis applied to r) {proofstep*} \parsers l \opri {proofstep*} = (by hypothesis, noparens(\op, irator i, LEFT), and the parser reduces) {proofstep*} \parsersl \opri = \parsersei {proofstep*} The details involve showing that the preconditions hold and it is safe to apply the induction hypothesis. We need facts about the results returned byunparse
andunparse'
, as stated in the next two propositions.{proof} By structural induction on
e
. Ife
is an atom, writee = ATOM x
, and {proofstep*}parse' (s, parseAtom a, append(tail, i))
{proofstep*} = (because by assumptionunparse e = IMAGE (LEX_ATOM x, EOI)
) {proofstep*}parse' (s, parseAtom (LEX_ATOM x), append(EOI, i))
{proofstep*} = (by definitions ofparseAtom
andappend
) {proofstep*}parse' (s, ATOM x, i)
{proofstep*} = (by assumption) {proofstep*}parse' (s, e, i)
{proofstep*}For the induction step, emust have the form e = APP(l, \op, r), and we can choose l' and r'as in unparse', so unparse e = infixImage(l', \op, r'). Because both l'and r'have type
image
, we can write l' = IMAGE(a_l, t_l)and r' = IMAGE(a_r, t_r). By using juxtaposition to abbreviate appending, we write unparse e = a_l t_l \opa_r t_r. [This shorthand abbreviates unparse e = IMAGE(a_l, append(t_l, INFIX(\op, a_r, t_r))).] We therefore consider {proofstep*} parse'(s,parseAtom
a_l, t_l \opa_r t_r i){proofstep*} = (by applying the induction hypothesis to l', as justified below) {proofstep*} parse'(s, l, \opa_r t_r i){proofstep*} \proofhang= (by definition ofparse'
, the parser shifts, because by hypothesis, \op is a top-level operator fromunparse e
and therefore noparens(\op, srator s, RIGHT)) {proofstep*} parse'(s l \op,parseAtom
a_r, t_r i){proofstep*} = (by applying the induction hypothesis to r', as justified below) {proofstep*} parse'(s l \op, r, i){proofstep*} \proofhang= (by definition ofparse'
, the parser reduces, because by hypothesis, \op is a top-level operator fromunparse e
and therefore noparens(\op, irator i, LEFT)) {proofstep*} parse'(s, APP(l, \op, r), i){proofstep*} = (by assumption) {proofstep*} parse'(s, e, i){proofstep*}It remains to justify applying the induction hypothesis to l' and r'. By definition of
unparse
andbracket
, either l' = unparse l \qquador\qquadl' = parenthesize(unparse l).In the latter case, l' = IMAGE(PARENS(unparse l), EOI), so by the definitions of a_l and t_l, we have a_l = PARENS(unparse l)and t_l = EOI, and {proofstep*} parse'(s,parseAtom
a_l, t_l \opa_r t_r i){proofstep*} = (by assumption and by definition ofparseAtom
) {proofstep*} parse'(s, parse(unparse l), \opa_r t_r i){proofstep*} \proofhang= (by the induction hypothesis applied to l, with empty stack and inputs) {proofstep*} parse'(s, l, \opa_r t_r i){proofstep*} which is the step made above.The case where l' = unparse lis the interesting case. {proofstep*} parse'(s,
parseAtom
a_l, t_l \opa_r t_r i){proofstep*} \proofhang= (For all top-level operators \op' in t_l, noparens(\op', \op, LEFT), and we apply the induction hypothesis with the inputs \opr' i, concluding that) {proofstep*} parse'(s,parseAtom
a_l, t_l \opa_r t_r i){proofstep*} We must show noparens(\op', \op, LEFT)for any top-level \op' from t_l. If \op is non- or right-associative, then by Proposition [->], \prec\op' > \prec\op, and noparens(\op', \op, LEFT)follows. If \op is left-associative, then {proofstep*} noparens(\op', \op, LEFT){proofstep*} \impliedby (by definition ofnoparens
and left-associativity of \op) {proofstep*} \op' binds as tightly as \op {proofstep*} \impliedby (transitivity) {proofstep*} \op' binds as tightly as \op_l \land\op_l binds as tightly as \op {proofstep*} Where unparse' l = (\op_l, a_l t_l)and the two clauses follow by applying Proposition [->] to l and e. {proof}{prop} \ifshortproof[*]If an expression e = APP(l, \op, r), then we can choose l'and r'as in unparse', so that unparse e = l' \opr'.
{prop}
- If \op is non-associative, then every top-level operator in l'and r'has a precedence greater than that of \op.
- If \op is left-associative, then every top-level operator in r'has a precedence greater than that of \op.
- If \op is right-associative, then every top-level operator in l'has a precedence greater than that of \op.
\ifshortproof
{proof} First, do case analysis on l' = bracket(unparse' l, LEFT, \op). If l' is parenthesized, then (3) and half of (1) hold vacuously because l' has no top-level operators. Otherwise, we can assume that unparse' l = (\op_l, l'), and {proofstep*} For all top-level operators \op'in l', \prec\op'>\prec\op {proofstep*} \impliedby (introducing \op_l) {proofstep*} For all top-level operators \op'in l', \prec\op'>=\prec\op_l
\preland\prec\op_l > \prec\op{proofstep*} \impliedby (definitions of ``covered'' and ``binds as tightly as'') {proofstep*} Fragment (\op_l, l')is covered \land \prec\op_l > \prec\op{proofstep*} \impliedby (definition ofnoparens
) {proofstep*} Fragment (\op_l, l')is covered
\prelandnoparens((\op_l, l'), \op, LEFT) \land\assoc\op!=LEFT{proofstep*} \impliedby (definition ofbracket
) {proofstep*} Fragment (\op_l, l')is covered
\prelandbracket((\op_l, l'), \op, LEFT)is not parenthesized \land \assoc\op!=LEFT {proofstep*} = (by assumption) {proofstep*}unparse l
is covered
\prelandl'is not parenthesized \land \assoc\op!=LEFT{proofstep*} \impliedby (Proposition [->], assumption) {proofstep*} \assoc\op!=LEFT{proofstep*} Which proves (3) and half of (1). A similar argument proves (2) and the other half of (1). {proof}{prop}\ifshortproof[*]For any expression
e
,unparse' e
is properly covered \ifshortproof, as defined below. {prop}\ifshortproof{defn}[Covered, properly covered] [*] A fragment
frag = (rator, im)
is covered if and only if every top-level operator inim
binds as tightly asrator
. A covered fragment is properly covered if and only ifrator = maxrator
or there is at least one top-level operator inim
. {defn} The intuition behind a covered fragment(rator, im)
is that low-precedence operators inim
, which do not bind as tightly asrator
, are safely ``covered'' with parentheses. Proper coverage avoids pathology for fragments that don't have any top-level operators.\ifshortproofProposition [<-] is proved by case analysis on the result returned by
bracket
. Proposition [<-] is proved by structural induction on the expression unparsed. The proof uses this lemma:\ifshortproof {proof} By structural induction. The base case is trivial.
For the induction step, we must prove {proofstep*}
unparse' e
is properly covered {proofstep*} \proofhang= (letting e = l \oprand unparse e = l' \opr'as above) {proofstep*} (\op, l' \opr')is properly covered {proofstep*} \impliedby (\op is a top-level operator of l' \opr') {proofstep*} (\op, l' \opr')is covered {proofstep*} = (definition of ``covered'') {proofstep*} For all top-level operators \op' in l' \opr', \op' binds as tightly as \op{proofstep*} = (distributing) {proofstep*} For all top-level operators \op' in l', \op' binds as tightly as \op
\preland\op binds as tightly as \op
\prelandFor all top-level operators \op' in r', \op' binds as tightly as \op{proofstep*} = (reflexivity of ``binds as tightly as,'' definition of ``covered'') {proofstep*} (\op, l')is covered \land (\op, r')is covered {proofstep*} = (body ofunparse'
) {proofstep*} (\op, bracket(unparse' l, LEFT, \op))is covered
\preland(\op, bracket(unparse' r, RIGHT, \op))is covered {proofstep*} \impliedby (bracketing lemma) {proofstep*} unparse' lproperly covered and unparse' rproperly covered {proofstep*} which is the induction hypothesis. {proof}{lemma}[Bracketing lemma] \ifshortproof[*]If fragment
f
is properly covered, then for any operatorrator
and associativitya
,bf = (rator, bracket(f, a, rator))
is covered. {lemma} \ifshortproofThe lemma is proved by case analysis on whetherbracket
parenthesizes its fragment. The proof uses transitivity of ``binds as tightly as.'' {proof} By case analysis on the result ofbracket
. If the result is parenthesized, thenbf
is trivially covered, because it contains no top-level operators.Otherwise, let us write
f = ($, im)
. Our goal is to show that {proofstep}bf = (rator, bracket(($, im), a, rator)
is covered. {proofstep} \impliedby (by the assumption that the result ofbracket
isim
) {proofstep}(rator, im)
is covered {proofstep} === (by Definition [<-]) {proofstep} \forallrator, \rator binds as tightly asrator
{proofstep} \impliedby (by transitivity of ``binds as tightly as'') {proofstep} \forallrator, \rator binds as tightly as$
\land$
binds as tightly asrator
{proofstep} \impliedby (pulling second clause out of \forall and strengthening it) {proofstep} (\forallrator,\rator binds as tightly as$
)
\preland$
binds as tightly asrator
\land\assoc\dollar=a
{proofstep} === (by definitions of ``covered'' andnoparens
) {proofstep}($, im)
is covered
\prelandnoparens($, rator, a)
{proofstep} \impliedby (by definition of ``properly covered'' and ofbracket
) {proofstep}($, im)
is properly covered
\prelandbracket(($, im),
\assoc\dollar, rator)
is not parenthesized {proofstep} both of which are true by assumption. {proof}\proofstext The proof is not as simple as it would have been if
parse
andunparse
had been designed in tandem with a proof, butparse
andunparse
were designed to have simple implementations, not a simple proof. One reason the proof is complex is that the functions are not full inverses; because parsing leaves no record of redundant parentheses, unparse(parse e)!=e. Moreover, although the functions are partial inverses, the computations are not inverses. The parser is an LR parser, working left to right with an explicit stack. The unparser works not left to right but bottom-up, with an implicit stack created by recursive calls.An alternative approach to unparsing might begin with a parser and attempt to derive an unparser using the program-transformation techniques known as program inversion.[cite gries:science, chen:inversion, von-wright:inversion][*] If successful, this attempt would produce an unparser that would be correct by construction, but it would look substantially different from the one presented here. Program-inversion rules have been developed for simple programming calculi, not real programming languages, so the unparser would have to be translated into a programming language in order to be executed. The unparsers in this paper are executable directly, and they are designed to be easy to understand, even if that design makes the proof of correctness harder to understand.
Adding prefix and postfix operators
Simple infix expressions are inadequate for many real programming languages, including C and ML. In this section, we add unary prefix and postfix operators, as well as infix operators of arbitrary arity. Moreover, we use the Standard ML modules system to generalize our idea of what an image is, so we can produce output for a prettyprinter, not just text.
Fixity distinguishes prefix and postfix operators from infix operators. Only infix operators have associativity.
<assoc.sml>= structure Assoc = struct datatype associativity = LEFT | RIGHT | NONASSOC datatype fixity = PREFIX | POSTFIX | INFIX of associativity endDefinesAssoc
,associativity
,fixity
(links are to index).
From these definitions, the possible values of type
fixity
arePREFIX
,POSTFIX
,INFIX LEFT
,INFIX RIGHT
, andINFIX NONASSOC
.Having prefix and postfix operators means that precedence alone no longer determines the correct parse, because the order of two prefix or of two postfix operators overrides precedence. For example, if
++
and*
are prefix operators andp
is an atom, then no matter what the precedences of++
and*
,++*p
and*++p
are both correct and unambiguous, equivalent to++(*p)
and*(++p)
, respectively.Infix operators of arbitrary arity broaden our view of non-associative operators. For example, in both ML and C, the following three expressions are all legal, but all different:
The comma operator is an n-ary, infix, non-associative operator.f((a, b), c)
!=f(a, b, c)
!=f(a, (b, c))
Operators now have a concrete representation, a precedence, and a fixity.
<general types>= (U-> U->) [D->] type precedence = Atom.Precedence.precedence type rator = Atom.atom * precedence * Assoc.fixityDefinesprecedence
,rator
(links are to index).
In expressions, operators may be unary, binary, or n-ary.
<type of expressions>= (U-> U->) [D->] datatype ast = ATOM of Atom.atom | UNARY of rator * ast | BINARY of ast * rator * ast | NARY of rator * ast listDefinesast
(links are to index).
List types are built in to ML. Type
'a list
means ``a list of values of type'a
,'' for any type'a
. It is convenient to treat NARY(\oplus, [e]) as equivalent to e for any \oplus;[e]
is the singleton list containinge
.In this new implementation, atomic components of expressions are no longer limited to strings, but may be any values of any type. That type is given the name
Atom.atom
, but it is not otherwise specified. The user of the unparser can pick any convenient type, e.g., prettyprinter directives. The structureAtom
is a parameter to the unparser module. It must provide the following types and values:<properties of structureAtom
>= (U->) [D->] type atom (* representation of operator or atomic expression *) val parenthesize : atom list -> atomDefinesatom
,parenthesize
(links are to index).
The unparser uses
Atom.parenthesize
to parenthesize a list of atoms.The chunk
<properties of structure
exploits a feature of the Standard ML modules system; in a signature, we can name types and values without giving their definitions. The unparser module is parameterized by a module calledAtom
>Atom
, and it works with anyAtom
that provides a typeatom
, a functionparenthesize
, and the other values listed in<properties of structure
. This parameterization makes the unparser general and easy to reuse.Atom
>To make this general capability as useful as possible, we make it possible to have the unparser add ``decorations'' to list of atoms. We do so by adding another kind of expression.
<type of expressions>+= (U-> U->) [<-D] | DECORATE of (Atom.atom list -> Atom.atom) * astThe
DECORATE
constructor attaches a ``decoration function'' to an expression. The decoration function must transform a list of atoms into a single atom in a way that is indetectible by the parser. For example, a decoration function might add values of typeAtom.atom
that indicate changes in indentation, color, or font of the unparsed text. These values would appear to the parser as white space.Finally, the user of the unparser must supply minimum and maximum precedences, which are used to define the sentinels
minrator
andmaxrator
.<properties of structureAtom
>+= (U->) [<-D->] structure Precedence : sig type precedence val min : precedence (* precedence below that of any operator *) val max : precedence (* precedence above that of any operator *) val compare : precedence * precedence -> order end val bogus : atom (* bogus atom used to make sentinel operators *)Definesbogus
,compare
,max
,min
,Precedence
,precedence
(links are to index).
In many applications,
minprec
andmaxprec
can be computed automatically.The datatype
ast
and the structureAtom
form the external interfaces to the unparser. In the implementation, images are more general---operators and atomic terms no longer alternate, but they are still properly parenthesized: {production}image \sequence\termoperator | \ntlexical-atom | \ntdecorated-image {production} \productionglue{production}lexical-atom \termatom \vbar \lit( \ntimage \lit){production} The corresponding ML types are:<general types>+= (U-> U->) [<-D] datatype image = IMAGE of lexeme list and lexeme = RATOR of rator | EXP of lexical_atom | DECORATED of (Atom.atom list -> Atom.atom) * image and lexical_atom = LEX_ATOM of Atom.atom | PARENS of imageDefinesimage
,lexeme
,lexical_atom
(links are to index).
where
lexeme
corresponds to \alternate\termoperator | \ntlexical-atom | \ntdecorated-image.Type
image
is internal; the unparser returns a list of atoms. It usesflatten
to convert animage
into such a list. It usesAtom.parenthesize
for parenthesization. \addboximage : image -> Atom.atom list \addboxlexeme : lexeme -> Atom.atom \addboxlexical_atom : lexical_atom -> Atom.atom \addboxflatten : image -> Atom.atom list<general>= (U->) [D->] local fun image (IMAGE l) = map lexeme l and lexeme (RATOR (atom, _, _)) = atom | lexeme (EXP a) = lexical_atom a | lexeme (DECORATED (f, im)) = f (image im) and lexical_atom (LEX_ATOM atom) = atom | lexical_atom (PARENS im) = Atom.parenthesize (image im) in val flatten = image (*unboxval*) endDefinesflatten
,image
,lexeme
,lexical_atom
(links are to index).
Each local function computes an atom or list of atoms that corresponds to a value of one type.[*] By programming convention, the name of each function is the name of the type it operates on. Because
local
hides these functions,flatten
is the only name visible outside this code chunk.
Parenthesizing general expressions
As in the simpler case, development of an unparser begins with the invariants that apply to an abstract syntax tree produced by parsing an image without parentheses. There are now three cases to consider: an operator may be the left child of a binary infix operator, the right child of a binary infix operator, or the only child of a unary prefix or postfix operator.
[picture] \drawline(54,-16)(50,-26) \dashline1(56,-16)(60,-26) \qquad\qquad[picture] \dashline1(54,-16)(50,-26) \drawline(56,-16)(60,-26) \qquad\qquad[picture] \drawline(55,-16)(55,-26)As before, the rules for parenthesization are encapsulated in anoparens
function. Because these trees are obtained by parsing parenthesis-free syntax, in case (a), noparens(\op_l, \op, LEFT); in case (b), noparens(\op_r, \op, RIGHT); and in case (c), noparens(\op', \op, NONASSOC). \addboxnoparens : rator * rator * Assoc.associativity -> bool \deeperbox2pt<general>+= (U->) [<-D->] fun noparens(inner as (_, pi, fi), outer as (_, po, fo), side) = case Atom.Precedence.compare (pi, po) of GREATER => true (* (a), (b), or (c) *) | pdiff => case (fi, side) of (POSTFIX, LEFT) => true (* (a) *) | (PREFIX, RIGHT) => true (* (b) *) | (INFIX LEFT, LEFT) => pdiff = EQUAL andalso fo = INFIX LEFT (* (a) *) | (INFIX RIGHT, RIGHT) => pdiff = EQUAL andalso fo = INFIX RIGHT (* (b) *) | (_, NONASSOC) => fi = fo (* (c) *) | _ => false (*unboxval*)Definesnoparens
(links are to index).
These rules can be understood as follows:
These rules for parenthesization, as embodied in the
- If the inner operator has higher precedence, parentheses are not needed.
- If a postfix operator appears to the left of an infix operator (a), it always applies to the preceding expression, and parentheses are not needed. Similarly when a prefix operator appears to the right of an infix operator (b).
- In case (a), if \op_l has lower precedence than \op, then parentheses are needed, but if the two have the same precedence, then parentheses can be avoided, provided both are left-associative. A similar argument applies to case (b) when both operators are right-associative.
- Finally, in case (c), if the precedence of \op' is not greater than the precedence of \op, then parentheses can be avoided if and only if both operators have the same fixity, i.e., both are prefix operators or both are postfix operators. This is the case in the example of
++*p
used to introduce this section.noparens
predicate, are the key to understanding both parsing and unparsing algorithms. Thenoparens
function is used in an unparser, which appears below, and in a parser, which appears in Appendix [->].As before, the unparser needs auxiliary functions to manipulate images. Because an image is now simply a list of lexemes, these functions are a bit simpler. \addboximage : Atom.atom -> image \addboxparenthesize : image -> image \addboxinfixImage : image * rator * image -> image \addboxprefixImage : rator * image -> image \addboxpostfixImage : image * rator -> image
<general>+= (U->) [<-D->] fun image a = IMAGE [EXP (LEX_ATOM a)] fun parenthesize image = IMAGE [EXP (PARENS image)] fun infixImage (IMAGE l, $, IMAGE r) = IMAGE (l @ RATOR $ :: r) fun prefixImage ( $, IMAGE r) = IMAGE ( RATOR $ :: r) fun postfixImage(IMAGE l, $ ) = IMAGE (l @ RATOR $ :: []) (*unboxval*)Definesimage
,infixImage
,parenthesize
,postfixImage
,prefixImage
(links are to index).
where
@
,::
, and[]
are list append, cons, and the empty list, which are all predefined in ML.We use the same
bracket
to parenthesize sub-expressions. \addboxbracket : fragment * Assoc.associativity * rator -> image \deeperbox2pt<general>+= (U->) [<-D->] type fragment = rator * image fun bracket((inner, image), side, outer) = if noparens(inner, outer, side) then image else parenthesize image (*unboxval*)Definesbracket
,fragment
(links are to index).
The structure of the unparser is as before: \addboxunparse : ast -> Atom.atom list
<general>+= (U->) [<-D] local val maxrator = (Atom.bogus, Atom.Precedence.max, INFIX NONASSOC) exception Impossible <functionunparse'
, to map an expression to a fragment> in fun unparse e = let val ($, im) = unparse' e in flatten im end (*unboxval*) endDefinesImpossible
,maxrator
,unparse
(links are to index).
The unparsing function itself has new cases: the unary operator, the n-ary operator, and the decorated expression. The unary operator appears before or after its operand, depending on its fixity. \addboxunparse' : ast -> fragment
<functionunparse'
, to map an expression to a fragment>= (<-U) fun unparse' (ATOM a) = (maxrator, image a) | unparse' (BINARY(l, $, r)) = let val l' = bracket (unparse' l, LEFT, $) val r' = bracket (unparse' r, RIGHT, $) in ($, infixImage(l', $, r')) end | unparse' (UNARY($, e)) = let val e' = bracket (unparse' e, NONASSOC, $) val (_, _, fixity) = $ in ($, case fixity of PREFIX => prefixImage ($, e') | POSTFIX => postfixImage(e', $) | INFIX _ => raise Impossible) end | unparse' (NARY(_, [])) = raise Impossible | unparse' (NARY($, [e])) = unparse' e | unparse' (NARY($, e::es)) = let val leftmost = bracket(unparse' e, LEFT, $) fun addOne(r, leftPrefix) = infixImage(leftPrefix, $, bracket(unparse' r, RIGHT, $)) in ($, foldl addOne leftmost es) end | unparse' (DECORATE (f, e)) = let val ($, im) = unparse' e in ($, IMAGE [DECORATED (f, im)]) end (*unboxval*)Definesunparse'
(links are to index).
The interesting case for the n-ary operator is covered by the pattern
NARY($, e::es)
, which represents the operator$
applied to more than one operand. The code inserts$
between the operands. The first operand,e
, is used to makeleftmost
, an initial ``left prefix.'' TheaddOne
function extends the left prefix, adding operator and operand on the right. The built-infoldl
function walks the listes
, usingaddOne
to extend the left prefix with the remaining operands. Because bothLEFT
andRIGHT
are used in the arguments tobracket
, operator must be non-associative for the operands to be parenthesized properly. Of course, no other associativity makes sense, because a left- or right-associative operator can always be treated as an infix binary operator. The n-ary case is needed only for non-associative operators.
Using the unparser
Before the unparser can be used, it has to be instantiated with a suitable
Atom
module. Once the user creates anAtom
module, he or she combines it with the unparser's implementation to createUnparser
, an instance of the unparser that is specialized to that particularAtom
.[*] This instance provides the datatypeast
, with its contructorsATOM
,UNARY
, etc. It also provides an unparsing function.<values exported by the unparser>= (U->) val unparse : ast -> Atom.atom listDefinesunparse
(links are to index).
Once the unparser is instantiated, it can be used in a real tool, where unparsing is usually one of a series of transformations. In a typical application generator, a program being generated passes through several representations:
This section discusses transformation from an internal representation into an unparsing tree, using examples from the New Jersey Machine-Code Toolkit.[cite ramsey:jersey] The first part of this paper discusses transformation from an unparsing tree to a list of atoms. References \refnooppen:prettyprinting and \refnohughes:pretty-printing discuss transformation of prettyprinting directives to indented text.
- an application-specific internal representation,
- an unparsing tree of type
Unparser.ast
,- a list of atoms of type
Atom.atom list
, which typically includes prettyprinting directives as well as text, and- the final program, represented as nicely indented text in the target language.
A preliminary step in creating unparsing trees is assigning precedence and associativity to the operators of the target language. This can be done easily and reliably by putting operators in a data structure that indicates precedence implicitly and fixity (which includes associativity) explicitly. A fragment of the structure used for C is {alltt} ...(L, [">>", "<<"]), (L, ["+", "-"]), (L, [" ...{alltt} where
L
is an abbreviation forAssoc.INFIX Assoc.LEFT
, ``infix left'' being one of the five possible fixities. Precedence increases as we move down the structure. This data structure is transformed into functionsCprec
andCfixity
that return the precedence and fixity of given operators. It is also used to compute the values ofminprec
andmaxprec
, which are needed to create an unparser.Supplying
minprec
andmaxprec
to the unparser may seem awkward.[*] One could supply instead a list of all operators to be used, letting the unparser computeminprec
andmaxprec
. This technique seems unnecessarily restrictive; it would limit the unparser not only to a single kind of atom, but also to a single set of operators. As written, one unparser can be (and is) used to emit code in different target languages, using different operator sets.Another way to eliminate
minprec
andmaxprec
would be to use a different representation of precedence inside the unparser, e.g., to represent precedence by an ML datatype that included values +\infty and -\infty in addition to all the integers. In a production environment, eliminatingminprec
andmaxprec
from the interface would justify a more complex implementation. For expository purposes, however, another user-defined datatype and special comparison functions would distract readers from the explanation of unparsing.The fixities of operators can be surprising. For example, the C operators
.
and->
, which are used to select members from structures and unions, are superficially binary infix operators, but their right-hand ``arguments'' are not expressions but identifiers, and semantically they are unary operators. The Toolkit in fact treats them as unary postfix operators, choosing their representations dynamically according to what member is being selected. For example, to create an unparsing tree corresponding to the selection of membertag
from structuree
, it uses the function \addboxdot : Unparser.ast * string -> Unparser.ast<tree-creation functions>= (U->) [D->] fun dot (e, tag) = Unparser.UNARY((pptext ("." ^ tag), Cprec ".", Assoc.POSTFIX), e)Definesdot
(links are to index).
Here the ML
^
operator is string concatenation. Thepptext
function takes a string and converts it to anAtom.atom
, making it intelligible to the prettyprinter.Handling mixfix operators may involve unparsing some trees during the construction of other trees. For example, to translate an array-indexing expression into C, one must put the subscript in square brackets. The subscript is unparsed independently and turned into an atom by applying
pplist
. It is then wrapped in square brackets, and the indexing operation itself is represented by the empty string, but with proper precedence and associativity, which, for example, enable the unparser to produce either*p[n]
or(*p)[n]
as required. A similar trick is used to unparse function calls after applying an n-ary comma operator to the arguments. \addboxsubscript : Unparser.ast * Unparser.ast -> Unparser.ast<tree-creation functions>+= (U->) [<-D] fun subscript(array, index) = let val index = pplist (Unparser.unparse index) val index = pplist [pptext "[", index, pptext "]"] in Unparser.BINARY ( array , (pptext "", Cprec "subscript", L) , Unparser.ATOM index ) end (*unboxval*)Definessubscript
(links are to index).
pplist
concatenates atoms; it is also used to defineAtom.parenthesize
:fun parenthesize l = pplist [pptext "(", pplist l, pptext ")"]
Use and availability
The unparser described in this paper is used in two application generators.[cite ramsey:embedded, ramsey:jersey] Because the paper is a literate program, the code used in the generators is extracted directly from the source text of the paper, and it is available from the author's Web site, http://www.cs.virginia.edu/~nr. Practitioners who can't use this code directly can build unparsers by implementing the
noparens
predicate and writingunparse
as a bottom-up tree walk. Appendix [->] of this paper presents the inverse of the general unparser, i.e., a parser that handles prefix and postfix as well as infix operators, all of dynamically chosen precedence and associativity. This parser, which is used in one of the application generators, is also available online. Appendix [->] presents the full proof of correctness of the simple unparser.\acknowledgements This work was supported in part by NSF grant ASC-9612756 and by the US Department of Defense under contract MDA904-97-C-0247 and DARPA order number E381. The views expressed are my own and should not be taken to be those of the DoD.
I thank Referee A for demanding improvements in presentation.
A parser for general expressions
[*] The parser shown here is the inverse of the unparser in the body of the paper. It handles prefix, postfix, and infix operators. It also supports concrete syntax in which concatenation of two expressions signifies the application of a ``juxtaposition'' operator to the two expressions. Like other binary operators, juxtaposition has precedence and associativity. For example, in Standard ML, juxtaposition denotes function application, it associates to the left, and it has a precedence higher than that of any explicit infix operator.[cite milner:definition] In awk, juxtaposition denotes string concatenation, and it has a precedence higher than the comparison operators but lower than the arithmetic operators.[cite aho:awk]
The parser handles juxtaposition by inserting an implicit ``juxtaposition operator'' whenever two expressions are juxtaposed in the input. (The unparser handles juxtaposition by using a binary infix operator whose representation is white space.) The user supplies a specification of this operator as part of the structure
Atom
. The specification includes an associativity, not a fixity, because juxtaposition is infix by definition.<properties of structureAtom
>+= (U->) [<-D] val juxtapositionSpec : atom * Precedence.precedence * Assoc.associativityDefinesjuxtapositionSpec
(links are to index).
Internally, the parser converts this specification into an infix operator.
<general parser>= (U-> U->) [D->] local val (rep, prec, assoc) = Atom.juxtapositionSpec in val juxtarator = (rep, prec, Assoc.INFIX assoc) endDefinesjuxtarator
(links are to index).
The parser's main data structure is a stack of operators and expressions satisfying these invariants:
These invariants are only partially enforced by the type system. All operators have the same type, so
- No postfix operator appears on the stack.
- If a binary operator is on the stack, its left argument is below it.
- An expression on the stack is preceded by a list of prefix operators, then either an infix operator or the bottom of the stack.
prefixop
andinfixop
are only hints.<general parser>+= (U-> U->) [<-D->] type infixop = rator type prefixop = rator datatype stack = BOT | BIN of stack * prefixop list * ast * infixop type stack_top = (prefixop list * ast)Definesinfixop
,prefixop
,stack
,stack_top
(links are to index).
The sentinel operator
minrator
appears on the bottom of the stack: \addboxsrator : stack -> rator<general parser>+= (U-> U->) [<-D->] val minrator = (Atom.bogus, Atom.Precedence.min, INFIX NONASSOC) fun srator (BIN (_, _, _, $)) = $ | srator BOT = minrator (*unboxval*)Definesminrator
,srator
(links are to index).
The parser's input looks something like this: {production}image \sequence\termprefix\ntlexical-atom\sequence\termpostfix \sequence\optional\terminfix \sequence\termprefix\ntlexical-atom\sequence\termpostfix {production}
In parsing, we assume that we can distinguish prefix, postfix, and infix operators. If juxtaposition is forbidden, these restrictions can be relaxed somewhat; it is then sufficient to be able to distinguish postfix from infix operators. The adjustments needed are left as an exercise.
isPrefix
identifies prefix operators. \addboxisPrefix : rator -> bool<general parser>+= (U-> U->) [<-D->] fun isPrefix (_, _, f) = f = PREFIX (*unboxval*)DefinesisPrefix
(links are to index).
The parser uses two functions,
parsePrefix
andparsePostfix
, depending on whether it is scanning before or after an atomic input. The overall structure of the parser is \addboxparse : image -> ast<general parser>+= (U-> U->) [<-D] exception ParseError of string * rator list local exception Impossible <general parsing functions> in val parse = parse (*unboxval*) endDefinesImpossible
,parse
,ParseError
(links are to index).
There are many cases.
parsePrefix
maintains a list of prefix operators, which is saves until it reaches an atomic input. It is not safe to reduce the prefix operators, because the atomic input might be followed by a postfix operator of higher precedence. IfparsePrefix
encounters a non-prefix operator or end of file, the input is not correct. WhenparsePrefix
encounters an atomic input, it passes the input and the saved prefixes toparsePostfix
. \addboxparsePrefix : stack * prefixop list * lexeme list -> ast<general parsing functions>= (<-U) [D->] fun parsePrefix(stack, prefixes, RATOR $ :: ipts') = if isPrefix $ then parsePrefix(stack, $ :: prefixes, ipts') else raise ParseError("%s is not a prefix operator", [$]) | parsePrefix(_, _, []) = raise ParseError ("premature end of expression", []) | parsePrefix(stack, prefixes, EXP a :: ipts') = parsePostfix(stack, (parseAtom a, prefixes), ipts') | parsePrefix(stack, prefixes, DECORATED (_, IMAGE l) :: ipts') = parsePrefix(stack, prefixes, l @ ipts')DefinesparsePrefix
(links are to index).
If it encounters decorated lexemes,
parsePrefix
ignores the decoration.In addition to the stack and the input,
parsePostfix
maintains a current expressione
, and a listprefixes
containing unreduced prefix operators that immediately precedee
. The idea is that operators arriving in the input must be tested either against the nearest unreduced prefix operator, or if there are no unreduced prefix operators, against the operator on the top of the stack. In the first case, the nearest unreduced prefix operator is called$
, and it may be compared with an operatorirator
from the input. If$
has higher precedence, it is reduced, by usingUNARY
to apply it to the current expressione
, and it becomes the new expression. Otherwise, the parser's behavior depends on the fixity ofirator
; it may reduceirator
, shift it, or insertjuxtarator
. \addboxparsePostfix : stack * (ast * prefixop list) * lexeme list -> ast<general parsing functions>+= (<-U) [<-D->] and parsePostfix(stack, (e, $ :: prefixes), ipts as RATOR (irator as (_, _, ifixity)) :: ipts') = if noparens($, irator, NONASSOC) then (* reduce prefix $ *) parsePostfix(stack, (UNARY($, e), prefixes), ipts) else if noparens(irator, $, NONASSOC) then (* irator has higher precedence *) case ifixity of POSTFIX => (* reduce postfix rator *) parsePostfix(stack, (UNARY(irator, e), $ :: prefixes), ipts') | INFIX _ => (* shift, look for prefixes *) parsePrefix(BIN(stack, $ :: prefixes, e, irator), [], ipts') | PREFIX => <insertjuxtarator
> else raise ParseError ("can't parse (%s e %s); " ^ "operators have equal precedence", [$, irator])DefinesparsePostfix
(links are to index).
If
parsePostfix
encounters an atom in the input, it insertsjuxtarator
no matter what the state of unreduced prefixes, since consuming atoms is the purview ofparsePrefix
.<general parsing functions>+= (<-U) [<-D->] | parsePostfix(stack, (e, prefixes), ipts as EXP _ :: _) = <insertjuxtarator
>The insertion itself is straightforward.
<insertjuxtarator
>= (<-U <-U U->) parsePostfix(stack, (e, prefixes), RATOR juxtarator :: ipts)The other major case for
parsePostfix
occurs when there are no more unreduced prefix operators. In that case, comparison must be made withsrator stack
, the operator on top of the stack.<general parsing functions>+= (<-U) [<-D->] | parsePostfix(stack, (e, prefixes as []), ipts as RATOR (irator as (_, _, ifixity)) :: ipts' ) = if noparens(srator stack, irator, LEFT) then (* reduce infix on stack *) case stack of BIN (stack', prefixes, e', $) => parsePostfix(stack', (BINARY(e', $, e), prefixes), ipts) | BOT => raise Impossible (* BOT has lowest prec *) else if noparens(irator, srator stack, RIGHT) then case ifixity of POSTFIX => (* reduce postfix rator *) parsePostfix(stack, (UNARY(irator, e), []), ipts') | INFIX _ => (* shift, look for prefixes *) parsePrefix(BIN(stack, [], e, irator), [], ipts') | PREFIX => <insertjuxtarator
> else raise ParseError ("%s is non-associative", [irator])As above, decorations are ignored.
<general parsing functions>+= (<-U) [<-D->] | parsePostfix (stack, (e, prefixes), DECORATED (_, IMAGE l) :: ipts') = parsePostfix(stack, (e, prefixes), l @ ipts')When the input is exhausted, the parser reduces first the prefixes, then the stack, until it reaches the result.
<general parsing functions>+= (<-U) [<-D->] | parsePostfix(stack, (e, $ :: prefixes), []) = (* reduce prefix $ *) parsePostfix(stack, (UNARY($, e), prefixes), []) | parsePostfix(BIN (stack', prefixes, e', $), (e, []), []) = (* reduce stack *) parsePostfix(stack', (BINARY(e', $, e), prefixes), []) | parsePostfix(BOT, (e, []), []) = eWe complete the parser with functions that parse an entire input, or an input in parentheses. \addboxparse : image -> ast \addboxparseAtom : lexical_atom -> ast \deeperbox2pt
<general parsing functions>+= (<-U) [<-D] and parse(IMAGE(l)) = parsePrefix(BOT, [], l) and parseAtom (LEX_ATOM a) = ATOM a | parseAtom (PARENS im) = parse im (*unboxval*)Definesparse
,parseAtom
(links are to index).
Proofs of correctness
[*] \proofstext [BibTeX bibliography]
<common prefix>= (U->) val minprec = ~1 val maxprec = 101Definesmaxprec
,minprec
(links are to index).
<infix.sml>= structure InfixUnparse = struct <common prefix> <infix> infix 7 ** infix 6 ++ infix 4 == infixr 3 ::== val mul = ("*", 7, LEFT) val add = ("+", 6, LEFT) val eq = ("=", 4, NONASSOC) val gets = (":=", 3, RIGHT) fun apply $ (l, r) = APP (l, $, r) val op ** = apply mul val op ++ = apply add val op == = apply eq val op ::== = apply gets fun imstring (IMAGE(a, tail)) = astring a ^ imstring' tail and imstring' EOI = "" | imstring' (INFIX ((t, _, _), a, tail)) = " " ^ t ^ " " ^ astring a ^ imstring' tail and astring (LEX_ATOM a) = a | astring (PARENS im) = "(" ^ imstring im ^ ")" val x = ATOM "x" val y = ATOM "y" val z = ATOM "z" fun E (n:int) = ATOM (Int.toString n) val junk = imstring (unparse (z ::== y ::== x ++ y ** z ++ E 2 == E 99 == x)) val junk' = imstring (unparse (z ::== y ::== (x ++ y) ** (z ++ E 2) == E 99 == x)) val left = imstring (unparse ( (x ++ y) ++ z )) val right = imstring (unparse ( x ++ (y ++ z ))) val left' = imstring (unparse ( (x ::== y) ::== z )) val right' = imstring (unparse ( x ::== (y ::== z ))) val left'' = imstring (unparse ( (x == y) == z )) val right'' = imstring (unparse ( x == (y == z ))) endDefinesadd
,apply
,astring
,E
,eq
,gets
,imstring
,imstring'
,InfixUnparse
,junk
,junk'
,left
,left'
,left''
,mul
,op
,right
,right'
,right''
,x
,y
,z
(links are to index).
<unparse.sig>= signature UNPARSEABLE_ATOM = sig <properties of structureAtom
> end signature UNPARSEABLE_PRECEDENCE = sig type precedence val min : precedence (* precedence below that of any operator *) val max : precedence (* precedence above that of any operator *) val compare : precedence * precedence -> order end signature UNPARSER = sig structure Atom : UNPARSEABLE_ATOM <general types> <type of expressions> <values exported by the unparser> endDefinesAtom
,compare
,max
,min
,precedence
,UNPARSEABLE_ATOM
,UNPARSEABLE_PRECEDENCE
,UNPARSER
(links are to index).
<unparse.sml>= functor UnparserFun(Atom: UNPARSEABLE_ATOM) = struct structure Atom = Atom open Assoc <general types> <type of expressions> <general> <general parser> endDefinesAtom
,UnparserFun
(links are to index).
I really bend over backwards, including paying a run-time cost, to use this very concrete parser with more abstract streams.
<exp-stream.sig>= signature EXP_STREAM = sig type operator val juxtarator : operator val fixity : operator -> Assoc.fixity structure Precedence : UNPARSEABLE_PRECEDENCE val precedence : operator -> Precedence.precedence type exp val unary : operator * exp -> exp val binary : exp * operator * exp -> exp val nary : operator * exp list -> exp datatype token = RATOR of operator | EXP of exp endDefinesbinary
,exp
,EXP_STREAM
,fixity
,juxtarator
,nary
,operator
,Precedence
,precedence
,token
,unary
(links are to index).
<parse.sml>= functor ParserFun(structure ExpStream : EXP_STREAM) = struct structure ES = ExpStream exception Impossible structure A = struct (* used for types, never for unparsing *) datatype atom = RATOR of ES.operator | EXP of ES.exp fun parenthesize l = raise Impossible structure Precedence = ExpStream.Precedence val bogus = RATOR (ES.juxtarator) val juxAtom = bogus val juxPrec = ES.precedence ES.juxtarator val juxAssoc = case ES.fixity ES.juxtarator of Assoc.INFIX a => a | _ => raise Impossible val juxtapositionSpec = (juxAtom, juxPrec, juxAssoc) end structure U = UnparserFun(A) structure P = struct open U open Assoc <general parser> end exception ParseError of string * ES.operator list fun maptoken (ES.RATOR $) = P.RATOR (A.RATOR $, ES.precedence $, ES.fixity $) | maptoken (ES.EXP e) = P.EXP (P.LEX_ATOM (A.EXP e)) fun unrator (A.RATOR $, _, _) = $ | unrator _ = raise ParseError ("This can't happen --- operator is an exp", []) fun unwrap (P.ATOM (A.EXP e)) = e | unwrap (P.ATOM _) = raise ParseError ("This can't happen --- exp is an operator", []) | unwrap (P.UNARY ($, e)) = ES.unary (unrator $, unwrap e) | unwrap (P.BINARY (e1, $, e2)) = ES.binary(unwrap e1, unrator $, unwrap e2) | unwrap (P.NARY ($, l)) = ES.nary (unrator$, map unwrap l) | unwrap (P.DECORATE (_, e)) = unwrap e fun parse l = (unwrap o P.parse o P.IMAGE o map maptoken) l handle P.ParseError (s, l) => raise ParseError (s, map unrator l) endDefinesA
,ES
,Impossible
,maptoken
,P
,parse
,ParseError
,ParserFun
,U
,unrator
,unwrap
(links are to index).
<unparse-test.sml>= structure TextAtom = struct type atom = string fun parenthesize l = String.concat ("(" :: l) ^ ")" structure Precedence = struct type precedence = int val min = ~1 val max = 101 val compare = Int.compare end val bogus = "bogus sentinel operator" (* juxtaposition is an implicit call *) val juxAtom = " " val juxPrec = Precedence.max - 1 val juxAssoc = Assoc.LEFT val juxtapositionSpec = (juxAtom, juxPrec, juxAssoc) end structure UnparseTest = struct structure U = UnparserFun(TextAtom) local structure Unparser = U fun pptext t = t val pplist = String.concat val L = Assoc.INFIX Assoc.LEFT fun Cprec (x : string) = 99 in <tree-creation functions> end open U open Atom infix 7 ** infix 6 ++ infix 4 == infixr 3 ::== val mul = ("*", 7, INFIX LEFT) val add = ("+", 6, INFIX LEFT) val eq = ("=", 4, INFIX NONASSOC) val gets = (":=", 3, INFIX RIGHT) val ince = ("++", 9, PREFIX) val inco = ("++", 9, POSTFIX) val dece = ("--", 9, PREFIX) val deco = ("--", 9, POSTFIX) val star = ("*", 8, PREFIX) val tuple = ("*", 1, INFIX NONASSOC) fun binary $ (l, r) = BINARY (l, $, r) fun unary $ e = UNARY($, e) fun nary $ es = NARY($, es) val op ** = binary mul val op ++ = binary add val op == = binary eq val op ::== = binary gets val dece = unary dece val deco = unary deco val star = unary star val /*/ = nary tuple val imstring = String.concat val x = ATOM "x" val y = ATOM "y" val z = ATOM "z" fun E n = ATOM (Int.toString n) val junk = imstring (unparse (z ::== y ::== x ++ y ** z ++ E 2 == E 99 == x)) val junk' = imstring (unparse (z ::== y ::== (x ++ y) ** (z ++ E 2) == E 99 == x)) val left = imstring (unparse ( (x ++ y) ++ z )) val right = imstring (unparse ( x ++ (y ++ z ))) val left' = imstring (unparse ( (x ::== y) ::== z )) val right' = imstring (unparse ( x ::== (y ::== z ))) val left'' = imstring (unparse ( (x == y) == z )) val right'' = imstring (unparse ( x == (y == z ))) val pp = imstring (unparse (E 2 ++ star (deco x))) val pp' = imstring (unparse (E 2 ++ deco (star x))) val tuplel = imstring (unparse (/*/ [/*/ [x, y], z])) val tupler = imstring (unparse (/*/ [x, /*/ [y, z]])) val tuplen = imstring (unparse (/*/ [x, y, z])) endDefines/*/
,add
,atom
,binary
,bogus
,dece
,deco
,E
,eq
,gets
,imstring
,ince
,inco
,junk
,junk'
,juxAssoc
,juxAtom
,juxPrec
,juxtapositionSpec
,left
,left'
,left''
,mul
,nary
,op
,parenthesize
,pp
,pp'
,Precedence
,right
,right'
,right''
,star
,TextAtom
,tuple
,tuplel
,tuplen
,tupler
,U
,unary
,UnparseTest
,x
,y
,z
(links are to index).
Index
Chunks
- <assoc.sml>: D1
- <common prefix>: D1, U2
- <exp-stream.sig>: D1
- <function
unparse'
, to map an expression to a fragment>: U1, D2- <general>: D1, D2, D3, D4, D5, U6
- <general parser>: D1, D2, D3, D4, D5, U6, U7
- <general parsing functions>: U1, D2, D3, D4, D5, D6, D7, D8
- <general types>: D1, D2, U3, U4
- <infix>: D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, U13
- <infix.sml>: D1
- <insert
juxtarator
>: U1, U2, D3, U4- <parse.sml>: D1
- <properties of structure
Atom
>: D1, D2, D3, U4- <shift or reduce, depending on relative precedence of \rators and \ratori>: U1, D2
- <tree-creation functions>: D1, D2, U3
- <type of expressions>: D1, D2, U3, U4
- <unparse-test.sml>: D1
- <unparse.sig>: D1
- <unparse.sml>: D1
- <values exported by the unparser>: D1, U2
Identifiers
- /*/: D1
- A: D1
- add: D1, D2
- append: D1, U2
- apply: D1
- Assoc: D1, U2, U3, U4, U5, U6, U7, U8, U9
- Associativity: D1, U2
- associativity: D1, D2, U3
- ast: D1, U2, U3, D4, U5, U6, U7, U8, U9
- astring: D1
- Atom: U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, U12, U13, U14, U15, D16, D17, U18
- atom: U1, U2, U3, D4, U5, U6, U7, U8, U9, U10, U11, U12, D13
- binary: D1, U2, D3
- bogus: D1, U2, U3, U4, D5
- bracket: U1, D2, U3, U4, U5, D6, U7
- compare: D1, U2, D3, U4
- dece: D1
- deco: D1
- dot: D1
- E: D1, D2
- eq: D1, D2
- ES: D1
- exp: D1, U2
- EXP_STREAM: D1, U2
- fixity: D1, U2, U3, D4, U5
- flatten: U1, D2, U3
- fragment: D1, D2
- gets: D1, D2
- image: U1, D2, D3, U4, U5, U6, U7, D8, U9, D10, D11, U12, U13
- image': U1, D2, U3
- Impossible: D1, U2, D3, U4, D5, U6, D7
- imstring: D1, D2
- imstring': D1
- ince: D1
- inco: D1
- infixImage: D1, U2, D3, U4
- infixop: D1
- InfixUnparse: D1
- irator: D1, U2, U3, U4, U5
- isPrefix: D1, U2
- junk: D1, D2
- junk': D1, D2
- juxAssoc: U1, D2
- juxAtom: U1, D2
- juxPrec: U1, D2
- juxtapositionSpec: D1, U2, U3, D4
- juxtarator: D1, U2, U3, D4, U5
- left: D1, D2
- left': D1, D2
- left'': D1, D2
- lexeme: D1, D2
- lexical_atom: D1, D2, D3
- maptoken: D1
- max: U1, D2, U3, D4, U5
- maxprec: U1, D2
- maxrator: D1, U2, U3, D4, U5
- min: D1, U2, D3, U4
- minprec: U1, U2, D3
- minrator: U1, D2, U3, D4
- mul: D1, D2
- nary: D1, U2, D3
- noparens: U1, D2, U3, U4, U5, U6, U7, U8, D9, U10, U11, U12
- op: D1, D2
- operator: U1, U2, U3, U4, D5, U6, U7
- P: D1
- parenthesize: D1, U2, D3, U4, U5, U6, D7, U8, U9, D10
- parse: U1, D2, U3, U4, D5, U6, D7, D8
- parse': U1, D2, U3, U4, U5
- parseAtom: D1, U2, U3, U4, U5, D6
- ParseError: D1, U2, U3, U4, D5
- parsePostfix: U1, U2, D3, U4, U5, U6, U7, U8
- parsePrefix: D1, U2, U3, U4
- ParserFun: D1
- postfixImage: D1, U2
- pp: D1
- pp': D1
- Precedence: U1, D2, U3, U4, U5, U6, D7, U8, D9
- precedence: D1, U2, U3, U4, D5, D6, U7, U8, D9, D10, U11, U12
- prefixImage: D1, U2
- prefixop: D1
- rator: D1, U2, U3, U4, U5, U6, U7, U8, D9, U10, U11, U12, U13, U14, U15, U16
- right: D1, D2
- right': D1, D2
- right'': D1, D2
- srator: D1, U2, D3, U4
- stack: D1, U2, U3, D4, U5, U6, U7, U8, U9, U10, U11
- stack_top: D1
- star: D1
- subscript: D1
- TextAtom: D1
- token: D1
- tuple: D1
- tuplel: D1
- tuplen: D1
- tupler: D1
- U: D1, D2
- unary: D1, U2, D3
- unparse: U1, D2, U3, D4, D5, U6, U7, U8
- unparse': D1, U2, U3, D4
- UNPARSEABLE_ATOM: D1, U2
- UNPARSEABLE_PRECEDENCE: D1, U2
- UNPARSER: D1
- UnparserFun: D1, U2, U3
- UnparseTest: D1
- unrator: D1
- unwrap: D1
- x: U1, D2, D3
- y: D1, D2
- z: D1, D2