Recursive parsers using fixed-point combinators
When a parser is just a function, it’s easy to define recursively, provided you’re careful to avoid an infinite recursion. But when a parser is an abstraction, we have to use a fixed-point combinator, just as we do in the lambda calculus.
Suppose that an expression exp
can contain subexpressions with the same syntax as exp
. Then we are likely to define it using a recursion equation. Here, for example, is Wirth’s classic grammar for four-function arithmetic:
exp ::= term { ("+" | "-") term }
term :: = factor { ("*" | "/") factor }
factor ::= int | "(" exp ")"
We’re going to want to turn this into a recursion equation
exp = ... exp ...
At which point we can define the parser by
fix (fn exp => ... exp ...)
If we want to make it efficient, we’ll further transform the code as
fix' (fn expref => let val exp = !!expref in ... exp ... end)
Here’s a sketch, not even typechecked:
fix' (fn expref =>
let val exp = !! expref
val factor = int <|> the "(" >> exp <~> the ")"
val term = build <$> factor <*>
many (pair <$> (the "*" <|> the "/") <*> factor)
in build <$> term <*>
many (pair <$> (the "+" <|> the "-") <*> term)
end)
where
pair x y (x, y)
build v [] = v
build v ((symbol, v1) :: pairs) = build (operator symbol (v, v1)) pairs
operator "+" = op +
operator "-" = op -
operator "*" = op *
operator "/" = op div
Error detection
Error detection is hell. It’s not just combinator parsers; Yacc is a mess, and there are many other issues. If you’re curious to see some evidence, start with Clint Jeffery’s paper on error messages in LR parsers. For combinator parsers, there’s not a lot in the professional literature on error detection or reporting. Evan Czaplicki is doing lovely work on Elm, but he’s got a language to grow and he doesn’t publish.
The best practice I know of is to identify when the parser should absolutely commit to a syntactic form, and at that point write something in roughly like this:
p1 >> (* commit point is here *) (p1 <|> perror "syntax error:)
The <|>
and perror
are almost certainly something you want to be done for you by a combinator.
The next level up is instead of just perror
, you write a combinator that can gobble up some bad tokens and maybe get parsing back on track. That’s what I’m doing in my instruction
parser in asmparse.sml
: if things go horribly wrong, my parser eats every token up to the next newline, then continues.
My opinion is that for your very first combinator parser, you shouldn’t have to muck around with error detection. Focus on parsing one line correctly, and if something goes wrong, my instruction
parser should spit out a message that shows the offending line. That should be good enough to go on with.
Debugging
The only way I know of to debug a combinator parser is so-called “printf
debugging.” To get you started, I’ve provided two functions in asmparse.sml
:
Function
debug
, as indebug "looking for arguments"
, returns a parser that prints the given message when it is run. The parser always succeeds and does not read any input.Here is an example of how I used
debug
:pair <$> reg <~> comma <~> debug "saw comma, looking for dots" <~> kw "..." <~> comma <*> reg
This sort of thing is easier to edit if you put every component on a line by itself
succeed pair <*> reg <~> comma <~> debug "saw comma, looking for dots" <~> kw "..." <~> comma <*> reg <~> debug "got 'reg, ..., reg'!!!
I learned this layout from Evan Czaplicki.
Function
verbose
wraps an existing parser in a new parser that prints a message when entered, but otherwise behaves like the original parser. This one is generally more useful as a standalone definition:val literal = verbose "literal?" literal (* redefines `literal` *)
I hope these two are enough. If you need more, the next step is to wrap a parser p
in a new parser that prints a message on entry, runs p
, scrutinizes p
’s result and reports something to standard error, then returns the result:
val veryVerbose : string -> 'a parser -> 'a parser
= (fn what => fn p =>
let fun shout s = app eprint ["looking for ", what, s, "\n"]
in P.ofFunction (fn ts =>
let val _ = shout "..."
val answer = P.asFunction p ts
val _ =
case answer
of NONE => shout ": failed"
| SOME (Error.ERROR _, _) => shout ": errored"
| SOME (Error.OK _, _) => shout ": succeeded"
in answer
end)
end)
Let us hope you don’t need to go there.