A classic regular expression is empty, a character, a concatenation, or the closure of a regular expression. Because it's no harder, we will use ``character classes'' (represented as functions of course).
<parcom.sml>= [D->] datatype regexp = EMPTY | CHARCLASS of char -> bool | CAT of regexp * regexp | STAR of regexp | OR of regexp * regexp
A simple character can be represented this way:
<parcom.sml>+= [<-D] val char : char -> regexp = fn c => CHARCLASS (fn c' => c = c')
regexp -> string -> boolwhich tells whether a string matches a regular expression. Hint: you may find success and failure continuations useful.
regexpdatatype entirely and instead define higher-order functions
ordirectly. Write a new version of
regcognizeto go with them.
datatype, but you can write new functions!
allthat is like
starexcept that it is greedy; that is, it insists on consuming as much input as possible. So the string
concat(star (char x), char x)but not
concat(all (char x), char x).
'bas a function that will take a sequence of values of type
'aand do one of two things:
()to a failure continuation
('a, 'b) parserto represent a parser. Hint: try the following types for success and failure continuations:
type 'b fail = unit -> 'b type 'b resume = unit -> 'b type ('a, 'b) succ = 'a list -> 'b -> 'b resume -> 'b
eof : ('a, unit) parserwhich succeeds when it has reached end of file and fails otherwise.
return : 'a -> ('a, 'b) parserwhich always succeeds and never consumes any input.
expect : ('a -> bool) -> ('a, 'a) parsersuch that
expect psucceeds if the input is nonempty and the first item in the input satisfies
pand fails otherwise. If
expect psucceeds, it returns that first item.
oroperates on parsers:
infix ||| op ||| : ('a, 'b) parser * ('a, 'b) parser -> ('a, 'b) parserThe parser
p1 ||| p2first tries
p2. You will need to build a suitable failure continuation for
infix >>= op >>= : ('a, 'b) parser * ('b -> ('a, 'b) parser) -> ('a, 'b) parser(It would be pleasant if
>>=had an even more general type, but to make it so would require a better way of managing backtracking than explicit success and failure continuations. One technique that is particularly effective is to have a an
('a, 'b) parserreturn a value of type
('a list * 'b) list, but this is efficient only if the outer list is lazy.)
p >>= k successfully will require constructing a suitable success
type regexp = (char, string) parser empty : regexp charclass : (char -> bool) -> regexp cat : regexp * regexp -> regexp star : regexp -> regexpIf you lean heavily on the combinators you have already done (especially
|||), this will be trivial. (You will also find the predefined function