CS152 Homework: Core ML

Due Tuesday, March 6, at 11:59 PM. The purpose of this assignment is to help you get acclimated to programming in ML, which you will do by writing many small exercises. Many of you are already acclimated; take note that I am aware of two populations in this course:

You will be more effective if you are aware of the Standard ML Basis Library. To my deep annoyance, different implementations of Standard ML implement different library versions by default. In particular, Moscow ML implements the 1997 basis, whereas MLton implements the 2004 basis. Standard ML of New Jersey's basis depends on the version, but it is no longer provided on the FAS 'nice' servers and we don't recommend it.

In any case, the 1997 basis is used in both the Ramsey and Kamin text and in Ullman, Chapter 9. I therefore recommend that you use this basis. The best guide to the basis is the Moscow ML help system; type

- help "";
at the mosml interactive prompt. The script /home/c/s/cs152/bin/mlton-compile runs MLton using this basis.

Guidelines

For all the problems in this homework, use function definition by pattern matching. In particular, do not use the functions [[null]], [[hd]], and [[tl]]; use patterns instead. Some useful list patterns include these patterns, to match lists of exactly 0, 1, 2, or 3 elements: <>= [] [x] [x, y] [a, b, c] @ and also these patterns, which match lists of at least 0, 1, 2, or 3 elements: <>= l h::t x1::x2::xs a::b::c::l @ When using these patterns, remember that function application has higher precedence than any infix operator! This is as true in patterns as it is anywhere else.

Do not define axuiliary functions at top level. Use [[local]] or [[let]]. Do not use [[open]]; if needed, use one-letter abbreviations for common structures. Do not use any imperative features unless the problem explicitly says it is OK.

Feel free to use the standard basis extensively. (But beware that the documentation at standardml.org may not be consistent with your implementation.) Moscow ML's [[help "lib";]] will tell you all about the library. And if you use

mosml -P full
as your interactive top-level loop, it will automatically load almost everything you might want from the standard basis.

All the sample code we show you is gathered in one place online.

As you start to learn ML, this table may help you convert your current knowledge:

μSchemeML
val val
definefun
lambdafn
Put all your solutions in one file: warmup.sml. (If separate files are easier, combine them with cat.) To receive credit, your warmup.sml file must compile and execute in the Moscow ML system. For example, we must be able to compile your code without warnings or errors:
ice3 /tmp >> /home/c/s/cs152/bin/mosmlc -c warmup.sml
ice3 /tmp >> 
Please remember to put your name, userid, and time spent in the warmup.sml file.

The homework problems

Solve the following problems:

Higher-order programming

  1. [7pts] Here's a function that is somewhat like [[fold]], but it works on binary operators.

    1. Define a function
      compound : ('a * 'a -> 'a) -> int -> 'a -> 'a
      
      that ``compounds'' a binary operator [[rator]] so that [[compound rator n x]] is [[x]] if [[n=0]], [[x rator x]] if [[n = 1]], and in general [[x rator (x rator (... rator x))]] where [[rator]] is applied exactly [[n]] times. [[compound rator]] need not behave well when applied to negative integers.

    2. When [[rator]] is associative, it is not necessary to apply it so many times. Define a function [[acompound]] that has the same type as [[compound]], and for an associative [[rator]] computes the same results, but such that [[acompound rator n x]] requires only O(log n) applications of [[rator]] to compute.

    3. Use the [[acompound]] function to define a function for integer exponentiation
      pow : int -> int -> int
      
      so that, for example, [[pow 3 2]] evaluates to 9. Hint: take note of the description of [[op]] in Ullman S5.4.4, page 165.

    Don't get confused by infix vs prefix operators. Remember this:

    • Fixity is a property of an identifier, not of a value.
    • If <$> is an infix identifier, then x <$> y is syntactic sugar for <$> applied to a pair containing x and y, which can also be written as op <$> (x, y).

Patterns

  1. [4pts] Consider the pattern [[(x::y::zs, w)]]. For each of the following expressions, tell whether the pattern matches the value denoted. If the pattern matches, say what values are bound to the four variables [[x]], [[y]], [[zs]], and [[w]]. If it does not match, explain why not.
    1. [[([1, 2, 3], ("CS", 152))]]
    2. [[(("CS", 152), [1, 2, 3])]]
    3. [[([("CS", 152)], (1, 2, 3))]]
    4. [[(["CS", "152"], true)]]
    5. [[([true, false], 2.718281828)]]

  2. [2pts] Using patterns, write a recursive Fibonacci function that does not use [[if]].

  3. [4pts] Write a function that takes a list of lower-case letters and returns [[true]] if the first character is a vowel and [[false]] if the first character is not a vowel or if the list is empty. Use the wildcard symbol [[_]] whenever possible, and avoid [[if]]. Remember that the ML character syntax is [[#"x"]], as decribed in Ullman, page 13.

  4. [2pts] Write the function [[null]], which when applied to a list tells whether the list is empty. Avoid [[if]], and make sure the function takes constant time. Make sure your function has the same type as the [[null]] in the Standard Basis.

Lists

  1. [2pts] [[foldl]] and [[foldr]] are predefined with type
    ('a * 'b -> 'b) -> 'b -> 'a list -> 'b
    
    They are like the μScheme versions except the ML versions are Curried.
    1. Implement [[length]] using [[foldl]] or [[foldr]].
    2. Implement [[rev]] using [[foldl]] or [[foldr]].
    3. Implement [[minlist]], which returns the smallest element of a non-empty list of integers. Your solution should work regardless of the representation of integers (e.g., it should not matter how many bits are used to represent integers). Your solution can fail (e.g., by [[raise Match]]) if given an empty list of integers. Use [[foldl]] or [[foldr]].
    Do not use recursion in any of your solutions.

  2. [2pts] Implement [[foldl]] and [[foldr]] using recursion. Do not create unnecessary cons cells. Do not use [[if]].

  3. [15pts] Implement queues using no side effects.
    1. For a first cut, try the following representation: <>= exception Empty type 'a queue = 'a list val put : 'a queue * 'a -> 'a queue val get : 'a queue -> 'a * 'a queue @ Implement [[put]] and [[get]]. [[get]] should raise the exception [[Empty]] if the queue is empty. @
    2. The representation shown above is unpleasant in that either [[put]] or [[get]] requires O(n) time. Using a pair of lists to represent a queue, implement [[put]] and [[get]] that take constant ``amortized'' time. (That is, a combination of [[n]] puts and gets, in any reasonable order, can be expected to take O(n) time total, instead of possibly O(n-squared) as above.) Hint: think about the tricks we used in class to come up with a cheap list-reversal function.

  4. [4pts] Write a function [[zip: 'a list * b list -> ('a * 'b) list]] that takes a pair of lists (of equal length) and returns the equivalent list of pairs. Raise the exception [[Mismatch]] if the lengths don't match.

  5. [4pts] Define a function
    pairfoldr : ('a * 'b * 'c -> 'c) -> 'c -> 'a list * 'b list -> 'c
    
    that applies a three-argument function to a pair of lists of equal length, using the same order as [[foldr]]. Use [[pairfoldr]] to implement [[zip]].

  6. [6pts] Define a function [[unzip : ('a * 'b) list -> 'a list * 'b list]] that turns a list of pairs into a pair of lists. This one is tricky; here's a sample result: <>= - unzip [(1, true), (3, false)]; > val it = ([1, 3], [true, false]) : int list * bool list @ Hint: Try defining an auxiliary function that uses the method of accumulating parameters, and be prepared to use [[rev]].

  7. [3pts] Define a function [[flatten : 'a list list -> 'a list]], which takes a list of lists and produces a single list containing all the elements in the correct order. For example, <>= - flatten [[1], [2, 3, 4], [], [5, 6]]; > val it = [1, 2, 3, 4, 5, 6] : int list @ To get full credit for this problem, your function should use no unnecessary cons cells.

Strings

For this section it may help you to be aware of the built-in functions [[implode]], [[explode]], and [[size]].
  1. [10pts] The goal of this problem is to help you explore why the Standard ML Basis Library has a built-in function [[concat]]. The problem has four parts:
    1. Without using [[concat]], define a function [[sflatten : string list -> string]], which takes a list of strings and produces a single string containing all the original strings concatenated in the correct order. Make your function as simple as you can.
    2. Explain how much space is used by [[sflatten]]. If you can't do an exact calculation, feel free to use Big-O notation.
    3. Define a version of [[sflatten]] that uses as little space as possible. How does its space usage compare with that of your original definition?
    4. A built-in, primitive function like [[concat]] can be written in C or in assembly language, and so can do things that an ML function cannot. It it possible that the primitive [[concat]] uses even less space than your version from part 3? If so, how much less? Is it worth having [[concat]]? Why?

Exceptions

  1. [6pts] Write a (Curried) function [[nth : int -> 'a list -> 'a]] to return the nth element of a list. (Number elements from 0.) Define one or more suitable exceptions to tell what is wrong in case the function is not defined on its arguments. Raise the appropriate exception in response to erroneous inputs.

  2. [17pts] Environments
    1. Define a type [['a env]] and functions <>= type 'a env = (* you fill in this part *) exception NotFound of string val emptyEnv : 'a env = (* ... *) val bindVar : string * 'a * 'a env -> 'a env = (* ... *) val lookup : string * 'a env -> 'a = (* ... *) @ such that you can use [['a env]] for a type environment or a value environment. On an attempt to look up an identifier that doesn't exist, raise the exception [[NotFound]]. Don't worry about efficiency.
    2. Do the same, except make [[type 'a env = string -> 'a]], and let <>= fun lookup (name, rho) = rho name @
    3. Write a function [[isBound : string * 'a env -> bool]] that works with both representations of environments. That is, write a single function that works regardless of whether environments are implemented as lists or as functions. You will need imperative features, like sequencing (the semicolon). Don't use [[if]].
    4. Write a function [[extendEnv : string list * 'a list * 'a env -> 'a env]] that takes a list of variables and a list of values and adds the corresponding bindings to an environment. It should work with both representations. Do not use recursion. Hint: you can do it in two lines using the higher-order list functions defined above.

Discriminated unions ([[datatype]])

  1. [12pts] Search trees.
    ML can easily represent binary trees containing arbitrary values in the nodes: <>= datatype 'a tree = NODE of 'a tree * 'a * 'a tree | LEAF @ To make a search tree, we need to compare values at nodes. The standard idiom for comparison is to define a function that returns a value of type [[order]]. As discussed in Ullman, page 325, [[order]] is predefined by <>= datatype order = LESS | EQUAL | GREATER @ Because [[order]] is predefined, if you include it in your program, you will hide the predefined version (which is in the so-called ``initial basis'') and other things may break mysteriously. So don't include it.

    We can use the [[order]] idiom to define a higher-order insertion function by, e.g., <>= fun insert cmp = let fun ins(x, LEAF) = NODE(LEAF, x, LEAF) | ins(x, NODE(left, y, right)) = (case cmp(x, y) of LESS => NODE(ins(x, left), y, right) | GREATER => NODE(left, y, ins(x, right)) | EQUAL => NODE(left, x, right)) in ins end @ This higher-order insertion function accepts a comparison function as argument, then returns an insertion function. (The parentheses around [[case]] aren't actually necessary here, but I've included them because if you leave them out when they are needed, you will be very confused by the resulting error messages.)

    We can use this idea to implement polymorphic sets in which we store the comparison function in the set itself. For example, <>= datatype 'a set = SET of ('a * 'a -> order) * 'a tree fun nullset cmp = SET (cmp, LEAF) @

    • Write a function [[addelt]] of type [['a * 'a set -> 'a set]] that adds an element to a set.
    • Write a function [[treeFoldr]] of type [[('a * 'b -> 'b) -> 'b -> 'a tree -> 'b]] that folds a function over every element of a tree, rightmost element first. [[treeFoldr op :: [] t]] should return the elements of [[t]] in order. Write a similar function [[setFold]] of type [[('a * 'b -> 'b) -> 'b -> 'a set -> 'b]].

      The function [[setFold]] should visit every element of the set exactly once, in an unspecified order.

Extra credit

There are three extra-credit problems: FIVES, VARARGS, and XMLTV.

FIVES

Recall the following problem from the Scheme homework:
Consider the class of well-formed arithmetic computations using the numeral 5. These are expressions formed by taking the integer literal 5, the four arithmetic operators +, -, *, and /, and properly placed parentheses. Such expressions correspond to binary trees in which the internal nodes are operators and every leaf is a 5. Write a μScheme program to answer one or more of the following questions:
  • What is the smallest positive integer than cannot be computed by an expression involving exactly five 5's?
  • What is the largest prime number that can computed by an expression involving exactly five 5's?
  • Exhibit an expression that evaluates to that prime number.
Write an ML function [[reachable]] of type
('a * 'a -> order) * ('a * 'a -> 'a) list -> 'a -> int -> 'a set
such that [[reachable (Int.compare, [op +, op -, op *, op div]) 5 5]] computes the set of all integers computable using the given operators and exactly five 5's. (You don't have to bother giving the answers to the questions above, since they're easy to get with [[setFold]].) My solution is under 20 lines of code, but it makes heavy use of the [[setFold]], [[nullset]], [[addelt]], and [[pairfoldr]] functions defined earlier.

Hints:

VARARGS

Extend μScheme to support procedures with a variable number of arguments. Do so by giving the name [[...]] (three dots) special significance when it appears as the last formal parameter in a lambda. For example:
-> (val f (lambda (x y ...)) (+ x (+ x (foldl + 0 ...)))
-> (f 1 2 3 4 5) ; inside f, rho = { x |-> 1, y |->, ... |-> '(3 4 5) }
15
In this example, it is an error for [[f]] to get fewer than two arguments. If [[f]] gets at least two arguments, any additional arguments are placed into an ordinarily list, and the list is used to initialize the location of the formal parameteter associated with [[...]].
  1. Implement this new feature inside of mlscheme.sml. I recommend that you begin by changing the definition of [[lambda]] on page 187 to
       and lambda = name list * { varargs : bool } * exp
    
    The type system will tell you what other code you have to change. For the parser, you may find the following function useful:
      fun newLambda (formals, body) =
         case rev formals
           of "..." :: fs' => LAMBDA (rev fs', {varargs=true},  body)
            | _            => LAMBDA (formals, {varargs=false}, body)
    
    The type of this function is
    [[name list * exp -> name list * {varargs : bool} * exp]];
    thus it is designed exactly for you to adapt old syntax to new syntax; you just drop it into the parser wherever [[LAMBDA]] was used.

  2. As a complement to the varargs lambda, write a new [[call]] primitive such that
    [[(call f '(1 2 3))]]
    is equivalent to
    [[(f 1 2 3)]]
    Sadly, you won't be able to use [[PRIMITIVE]] for this; you'll have to invent a new kind of thing that has access to the internal [[eval]].

  3. Demonstrate these utilities by writing a higher-order function [[cons-logger]] that counts cons calls in a private variable. It should operate as follows:
    -> (val cl (cons-logger))
    -> (val log-cons (car cl))
    -> (val conses-logged (cdr cl))  
    -> (conses-logged)
    0
    -> (log-cons f e1 e2 ... en) ; returns (f e1 e2 ... en), incrementing
                                 ; private counter whenever cons is called
    -> (conses-logged)
    99  ; or whatever else is the number of times cons is called 
        ; during the call to log-cons
    

  4. Rewrite the APPLY-CLOSURE rule to account for the new abstract syntax and behavior. To help you, simplified LaTeX for the original rule is online.

PARCOM

You may implement parsing combinators, as described in Beyond Regular Expressions.

What to submit

Submit the files [[README]], [[warmup.sml]], and optionally [[varargs.sml]] or [[parcom.sml]], using the function submit-ml. In comments at the top of your [[README]] file, please include your name, the names of any collaborators, and the number of hours you spent on the assignment.