Draft

# Recursive parsers using fixed-point combinators

When a parser is just a function, it’s easy to define recursively, provided you’re careful to avoid an infinite recursion. But when a parser is an abstraction, we have to use a fixed-point combinator, just as we do in the lambda calculus.

Suppose that an expression `exp` can contain subexpressions with the same syntax as `exp`. Then we are likely to define it using a recursion equation. Here, for example, is Wirth’s classic grammar for four-function arithmetic:

``````exp  ::= term { ("+" | "-") term }
term :: = factor { ("*" | "/") factor }
factor ::= int | "(" exp ")"``````

We’re going to want to turn this into a recursion equation

``exp = ... exp ...``

At which point we can define the parser by

``fix (fn exp => ... exp ...)``

If we want to make it efficient, we’ll further transform the code as

``fix' (fn expref => let val exp = !!expref in ... exp ... end)``

Here’s a sketch, not even typechecked:

``````fix' (fn expref =>
let val exp = !! expref
val factor = int <|> the "(" >> exp <~> the ")"
val term   = build <\$> factor <*>
many (pair <\$> (the "*" <|> the "/") <*> factor)
in               build <\$> term <*>
many (pair <\$> (the "+" <|> the "-") <*> term)
end)``````

where

``````pair x y (x, y)
build v [] = v
build v ((symbol, v1) :: pairs) = build (operator symbol (v, v1)) pairs
operator "+" = op +
operator "-" = op -
operator "*" = op *
operator "/" = op div``````

# Error detection

Error detection is hell. It’s not just combinator parsers; Yacc is a mess, and there are many other issues. If you’re curious to see some evidence, start with Clint Jeffery’s paper on error messages in LR parsers. For combinator parsers, there’s not a lot in the professional literature on error detection or reporting. Evan Czaplicki is doing lovely work on Elm, but he’s got a language to grow and he doesn’t publish.

The best practice I know of is to identify when the parser should absolutely commit to a syntactic form, and at that point write something in roughly like this:

``p1 >> (* commit point is here *) (p1 <|> perror "syntax error:)``

The `<|>` and `perror` are almost certainly something you want to be done for you by a combinator.

The next level up is instead of just `perror`, you write a combinator that can gobble up some bad tokens and maybe get parsing back on track. That’s what I’m doing in my `instruction` parser in `asmparse.sml`: if things go horribly wrong, my parser eats every token up to the next newline, then continues.

My opinion is that for your very first combinator parser, you shouldn’t have to muck around with error detection. Focus on parsing one line correctly, and if something goes wrong, my `instruction` parser should spit out a message that shows the offending line. That should be good enough to go on with.

# Debugging

The only way I know of to debug a combinator parser is so-called “`printf` debugging.” To get you started, I’ve provided two functions in `asmparse.sml`:

• Function `debug`, as in `debug "looking for arguments"`, returns a parser that prints the given message when it is run. The parser always succeeds and does not read any input.

Here is an example of how I used `debug`:

``````pair <\$> reg <~> comma <~> debug "saw comma, looking for dots"
<~> kw "..." <~> comma <*> reg``````

This sort of thing is easier to edit if you put every component on a line by itself

``````succeed pair
<*> reg
<~> comma
<~> debug "saw comma, looking for dots"
<~> kw "..."
<~> comma
<*> reg
<~> debug "got 'reg, ..., reg'!!!``````

I learned this layout from Evan Czaplicki.

• Function `verbose` wraps an existing parser in a new parser that prints a message when entered, but otherwise behaves like the original parser. This one is generally more useful as a standalone definition:

``val literal = verbose "literal?" literal   (* redefines `literal` *)``

I hope these two are enough. If you need more, the next step is to wrap a parser `p` in a new parser that prints a message on entry, runs `p`, scrutinizes `p`’s result and reports something to standard error, then returns the result:

``````val veryVerbose : string -> 'a parser -> 'a parser
= (fn what => fn p =>
let fun shout s = app eprint ["looking for ", what, s, "\n"]
in  P.ofFunction (fn ts =>
let val _ = shout "..."
val answer = P.asFunction p ts
val _ =