parcom.sml
.
A classic regular expression is empty, a character, a concatenation, or the closure of a regular expression. Because it's no harder, we will use ``character classes'' (represented as functions of course).
<parcom.sml>= [D->] datatype regexp = EMPTY | CHARCLASS of char -> bool | CAT of regexp * regexp | STAR of regexp | OR of regexp * regexp
A simple character can be represented this way:
<parcom.sml>+= [<-D] val char : char -> regexp = fn c => CHARCLASS (fn c' => c = c')
recognize
of type regexp -> string -> bool
which tells whether a string matches a regular expression.
Hint: you may find success and failure continuations useful.
regexp
datatype entirely and instead define
higher-order functions
empty
, charclass
, cat
, star
, and or
directly.
Write a new version of regcognize
to go with them.
datatype
, but you can write new functions!
all
that is like star
except that
it is greedy; that is, it insists on consuming as much input
as possible.
So the string xxx
would match concat(star (char x), char x)
but not concat(all (char x), char x)
.
'a
to 'b
as a function
that will take a sequence of values of type 'a
and do one of
two things:
'b
, or
()
to a failure continuation
('a, 'b) parser
to represent a parser.
Hint: try the following types for success and failure continuations:
type 'b fail = unit -> 'b type 'b resume = unit -> 'b type ('a, 'b) succ = 'a list -> 'b -> 'b resume -> 'b
eof : ('a, unit) parserwhich succeeds when it has reached end of file and fails otherwise.
return : 'a -> ('a, 'b) parserwhich always succeeds and never consumes any input.
expect : ('a -> bool) -> ('a, 'a) parsersuch that
expect p
succeeds if the input is nonempty and the first item in the
input satisfies p
and fails otherwise.
If expect p
succeeds, it returns that first item.
or
operates on parsers:
infix ||| op ||| : ('a, 'b) parser * ('a, 'b) parser -> ('a, 'b) parserThe parser
p1 ||| p2
first tries p1
, then p2
.
You will need to build a suitable failure continuation for p1
.
infix >>= op >>= : ('a, 'b) parser * ('b -> ('a, 'b) parser) -> ('a, 'b) parser(It would be pleasant if
>>=
had an even more general type,
but to make it so would require a better way of managing backtracking than
explicit success and failure continuations.
One technique that is particularly effective is to have a an
('a, 'b) parser
return a value of type
('a list * 'b) list
,
but this is efficient only if the outer list is lazy.)
To implement p >>= k
successfully will require constructing a suitable success
continuation for p
.
type regexp = (char, string) parser empty : regexp charclass : (char -> bool) -> regexp cat : regexp * regexp -> regexp star : regexp -> regexpIf you lean heavily on the combinators you have already done (especially
>>=
, return
,
and |||
), this will be trivial.
(You will also find the predefined function str
useful.)