A classic regular expression is empty, a character, a concatenation,
or the closure of a regular expression.
Because it's no harder, we will use ``character classes'' (represented
as functions of course).
<
To implement [[p >>= k]] successfully will require constructing a suitable success
continuation for [[p]].
So who cares? What's the big deal here.
Well, you can't extend a [[datatype]], but you can write new functions!
Part II: The birth of parsing combinators
OK, enough fooling around.
Let's define a parser from [['a]] to [['b]] as a function
that will take a sequence of values of type [['a]] and do one of
two things:
type 'b fail = unit -> 'b
type 'b resume = unit -> 'b
type ('a, 'b) succ = 'a list -> 'b -> 'b resume -> 'b
eof : ('a, unit) parser
which succeeds when it has reached end of file and fails otherwise.
return : 'a -> ('a, 'b) parser
which always succeeds and never consumes any input.
expect : ('a -> bool) -> ('a, 'a) parser
such that [[expect p]] succeeds if the input is nonempty and the first item in the
input satisfies [[p]] and fails otherwise.
If [[expect p]] succeeds, it returns that first item.
infix |||
op ||| : ('a, 'b) parser * ('a, 'b) parser -> ('a, 'b) parser
The parser [[p1 ||| p2]] first tries [[p1]], then [[p2]].
You will need to build a suitable failure continuation for [[p1]].
infix >>=
op >>= : ('a, 'b) parser * ('b -> ('a, 'b) parser) -> ('a, 'b) parser
(It would be pleasant if [[>>=]] had an even more general type,
but to make it so would require a better way of managing backtracking than
explicit success and failure continuations.
One technique that is particularly effective is to have a an
[[('a, 'b) parser]] return a value of type
[[('a list * 'b) list]],
but this is efficient only if the outer list is lazy.)
type regexp = (char, string) parser
empty : regexp
charclass : (char -> bool) -> regexp
cat : regexp * regexp -> regexp
star : regexp -> regexp
If you lean heavily on the combinators you have already done (especially [[>>=]], [[return]],
and [[|||]]), this will be trivial.
(You will also find the predefined function [[str]] useful.)