CS152 Homework: Core ML

Due Tuesday, November 7, at 12:10 AM (a.k.a. Monday night at midnight)

This assignment has two parts. The purpose of the first part is to help you get acclimated to programming in ML by writing many small exercises. The second part, on type checking, will help prepare you for type inference.

Guidelines

Use function definition by pattern matching for all the problems in this homework. In particular, do not use the functions [[null]], [[hd]], and [[tl]]; use patterns instead. Some useful list patterns include these patterns, to match lists of exactly 0, 1, 2, or 3 elements: <>= [] [x] [x, y] [a, b, c] @ and also these patterns, which match lists of at least 0, 1, 2, or 3 elements: <>= l h::t x1::x2::xs a::b::c::l @ When using these patterns, remember that function application has higher precedence than any infix operator!. This is as true in patterns as it is anywhere else.

Do not define axuiliary functions at top level. Use [[local]] or [[let]]. Do not use any imperative features unless the problem explicitly says it is OK.

All the sample code we show you is gathered in one place online. If it is useful to you, a Linux implementation of Moscow ML is readily available; see the Moscow ML Home Page for more information.

Part I: ML Warmup

Put all the solutions to this part in one file: warmup.sml. In order to receive credit for this part, your warmup.sml file must compile and execute in the Moscow ML system. For example, we must be able to compile your code without warnings or errors:

is03 /tmp >> /usr/local/mosml/bin/mosmlc -c warmup.sml
is03 /tmp >>

Please remember to put your name, login, and time spent in the file.

Solve the following problems:

Higher-order programming

[3pts] Here's a function that is somewhat like [[fold]], but it works on binary operators.
1. Define a function
```
compound : ('a * 'a -> 'a) -> int -> 'a -> 'a
```
  That ``compounds'' a binary operator [[rator]] so that [[compound rator n x]] is [[x]] if [[n=0]], [[x rator x]] if [[n = 1]], and in general [[x rator (x rator (... rator x))]] where [[rator]] is applied exactly [[n]] times. [[compound rator]] need not behave well when applied to negative integers.
2. When [[rator]] is associative, it is not necessary to apply it so many times. Define a function [[acompound]] that has the same type as [[compound]], and for an associative [[rator]] computes the same results, but such that [[acompound rator n x]] requires only O(log n) applications of [[rator]] to compute.
3. Use the [[acompound]] function to define a function for integer exponentiation
```
pow : int -> int -> int
```
  so that, for example, [[pow 3 2]] evaluates to 9. Hint: take note of the description of [[op]] in Ullman S5.4.4, page 165.

Patterns

[2pts] Consider the pattern [[(x::y::zs, w)]]. For each of the following expressions, tell whether the pattern matches the value denoted. If the pattern matches, say what values are bound to the four variables [[x]], [[y]], [[zs]], and [[w]]. If it does not match, explain why not.
1. [[([1, 2, 3], ("CS", 152))]]
2. [[(("CS", 152), [1, 2, 3])]]
3. [[([("CS", 152)], (1, 2, 3))]]
4. [[(["CS", "152"], true)]]
5. [[([true, false], 2.718281828)]]
[1pt] Using patterns, write a recursive Fibonacci function that does not use [[if]].
[2pt] Write a function that takes a list of lower-case letters and returns [[true]] if the first character is a vowel and [[false]] if the first character is not a vowel. Use the wildcard symbol [[_]] whenever possible, and avoid [[if]]. Remember that the ML character syntax is [[#"x"]], as decribed in Ullman, page 13.
[1pt] Write the function [[null]], which when applied to a list tells whether the list is empty. Avoid [[if]], and make sure the function takes constant time.

Lists

[1pt] [[foldl]] and [[foldr]] are predefined with type
```
('a * 'b -> 'b) -> 'b -> 'a list -> 'b
```
They are like the uScheme versions except the ML versions are Curried.
1. implement [[length]] using [[foldl]] or [[foldr]].
2. implement [[rev]] using [[foldl]] or [[foldr]].
3. implement [[minlist]], which returns the smallest element of a non-empty list of integers. Your solution should work regardless of the representation of integers (e.g., it should not matter how many bits are used to represent integers). Your solution can fail (e.g., by [[raise Match]]) if given an empty list of integers. Use [[foldl]] or [[foldr]].
Do not use recursion in any of your solutions.
[1pt] Implement [[foldl]] and [[foldr]] using recursion. Do not create unnecessary cons cells. Do not use [[if]].
[8pts] Implement queues using no side effects.
1. For a first cut, try the following representation: <>= exception Empty type 'a queue = 'a list val put : 'a queue * 'a -> 'a queue val get : 'a queue -> 'a * 'a queue @ Implement [[put]] and [[get]]. [[get]] should raise the exception [[Empty]] if the queue is empty. @
2. The representation shown above is unpleasant in that either [[put]] or [[get]] requires O(n) time. Using a pair of lists to represent a queue, implement [[put]] and [[get]] that take constant ``amortized'' time. (That is, a combination of [[n]] puts and gets, in any reasonable order, can be expected to take O(n) time total, instead of possibly O(n-squared) as above.) Hint: think about the tricks we used in class to come up with a cheap list-reversal function.
[2pts] Write a function [[zip: 'a list * b list -> ('a * 'b) list]] that takes a pair of lists (of equal length) and returns the equivalent list of pairs. Raise the exception [[Mismatch]] if the lengths don't match.
[2pts] Define a function
```
pairfoldr : ('a * 'b * 'c -> 'c) -> 'c -> 'a list * 'b list -> 'c
```
that applies a three-argument function to a pair of lists of equal length, using the same order as [[foldr]]. Use [[pairfoldr]] to implement [[zip]].
[3pts] Define a function [[unzip : ('a * 'b) list -> 'a list * 'b list]] that turns a list of pairs into a pair of lists. This one is tricky; here's a sample result: <>= - unzip [(1, true), (3, false)]; > val it = ([1, 3], [true, false]) : int list * bool list @ Hint: Try defining an auxiliary function that uses the method of accumulating parameters, and be prepared to use [[rev]].
[2pts] Define a function [[split : 'a list -> 'a list * 'a list]] that takes a list and separates it into its even and odd-numbered elements. Unlike the previous functions, this one need not preserve the order of the elements.
[4pts] Using [[split]], implement a higher-order [[mergesort]] function with type
```
mergesort : ('a * 'a -> bool) -> 'a list -> 'a list
```
You'll need two auxiliary functions, one to merge and one to sort. You'll also need two base cases for the sort---not just the empty list, but also the list containing one element.

Exceptions

[3pts] Write a (Curried) function [[nth : int -> 'a list -> 'a]] to return the nth element of a list. (Number elements from 0.) Define one or more suitable exceptions to tell what is wrong in case the function is not defined on its arguments. Raise the appropriate exception in response to erroneous inputs.
[8pts] Environments
1. Define a type [['a env]] and functions <>= type 'a env = (* you fill in this part *) exception NotFound of string val emptyEnv : 'a env = (* ... *) val bindVar : string * 'a * 'a env -> 'a env = (* ... *) val lookup : string * 'a env -> 'a = (* ... *) @ such that you can use [['a env]] for a type environment or a value environment. Raise the exception [[NotFound]] on the attempt to look up an identifier that doesn't exist. Don't worry about efficiency.
2. Do the same, except make [[type 'a env = string -> 'a]], and let <>= fun lookup (name, rho) = rho name @ (This should remind you of an extra-credit problem from the Scheme homework.)
3. Write a function [[isBound : string * 'a env -> bool]] that works with both representations of environments. That is, write a single function that works regardless of whether environments are implemented as lists or as functions. You will need imperative features, like sequencing (the semicolon). Don't use [[if]].
4. Write a function [[extendEnv : string list * 'a list * 'a env -> 'a env]] that takes a list of variables and a list of values and adds the corresponding bindings to an environment. It should work with both representations. Do not use recursion. Hint: you can do it in two lines using the higher-order list functions defined above.

Discriminated unions ([[datatype]])

[7pts] Search trees.
ML can easily represent binary trees containing arbitrary values in the nodes: <
>= datatype 'a tree = NODE of 'a tree * 'a * 'a tree | LEAF @ To make a search tree, we need to compare values at nodes. The standard idiom for comparison is to define a function that returns a value of type [[order]]. As discussed in Ullman, page 325, [[order]] is predefined by <
>= datatype order = LESS | EQUAL | GREATER @ Because [[order]] is predefined, if you include it in your program, you will hide the predefined version (which is in the so-called ``initial basis'') and other things may break mysteriously. So don't include it.
We can use the [[order]] idiom to define a higher-order insertion function by, e.g., <>= fun insert cmp = let fun ins(x, LEAF) = NODE(LEAF, x, LEAF) | ins(x, NODE(left, y, right)) = (case cmp(x, y) of LESS => NODE(ins(x, left), y, right) | GREATER => NODE(left, y, ins(x, right)) | EQUAL => NODE(left, x, right)) in ins end @ This higher-order insertion function accepts a comparison function as argument, then returns an insertion function. (The parentheses around [[case]] aren't actually necessary here, but I've included them because if you leave them out when they are needed, I promise you will be very confused by the resulting error messages.)
We can use this idea to implement polymorphic sets in which we store the comparison function in the set itself. For example, <>= datatype 'a set = SET of ('a * 'a -> order) * 'a tree fun nullset cmp = SET (cmp, LEAF) @
- Write a function [[addelt]] of type [['a * 'a set -> 'a set]] that adds an element to a set.
- Write a function [[treeFoldr]] of type [[('a * 'b -> 'b) -> 'b -> 'a tree -> 'b]] that folds a function over every element of a tree, rightmost element first. [[treeFoldr op :: [] t]] should return the elements of [[t]] in order. Write a similar function [[setFold]] of type [[('a * 'b -> 'b) -> 'b -> 'a set -> 'b]].
  The function [[setFold]] should visit every element of the set exactly once, in an unspecified order.
Extra credit: Recall the following problem from the Scheme homework:
Consider the class of well-formed arithmetic computations using the numeral 5. These are expressions formed by taking the integer literal 5, the four arithmetic operators +, -, *, and /, and properly placed parentheses. Such expressions correspond to binary trees in which the internal nodes are operators and every leaf is a 5. Write a uScheme program to answer one or more of the following questions:

What is the smallest positive integer than cannot be computed by an expression involving exactly five 5's?
What is the largest prime number that can computed by an expression involving exactly five 5's?
Exhibit an expression that evaluates to that prime number.
Write an ML function [[reachable]] of type
```
('a * 'a -> order) * ('a * 'a -> 'a) list -> 'a -> int -> 'a set
```
such that [[reachable (Int.compare, [op +, op -, op *, op div]) 5 5]] computes the set of all integers computable using the given operators and exactly five 5's. (You don't have to bother giving the answers to the questions above, since they're easy to get with [[setFold]].) My solution is under 20 lines of code, but it makes heavy use of the [[setFold]], [[nullset]], [[addelt]], and [[pairfoldr]] functions defined earlier.
Hints:

Part II: Lambda Calculus and type checking

Put all the solutions to this part in one file: types.sml. As above, your code must compile without errors or warnings.

[10pts] Untyped lambda calculus.
1. Define a recursive type [[lambda]] to represent terms in the untyped lambda calculus. There are three cases: variables, lambda abstractions, and applications. Therefore you will need an ML [[datatype]] with three constructors. Use strings to represent variables.
2. Using your type, exhibit an ML value that represents the term $\x.\y.x$
3. Define a function [[free : lambda -> string list]] to list all the free variables in a lambda term. Do not list bound variables!
[12pts] Typed lambda calculus.
1. Define a recursive type [[ltype]] to represent types in the first-order typed lambda calculus. Represent all basic types by a single constructor [[BASIC of string]].
2. Define a recursive type [[tlambda]] to represent terms in the first-order typed lambda calculus.
3. Using your type, exhibit an ML value that represents the term $\x:int.\y:int->int.x$
4. Define a function [[termString : lambda -> string]], which prints an ASCII string representing the term. Don't worry about suppressing redundant parentheses; for example, you might use [["\x:int.(\y:(int->int).x)"]] to represent the term above.
[18pts] Type checking.
Using your definitions of [[tlambda]] and [[ltype]] from the previous problem, define a function [[typeof : tlambda * ltype env -> ltype]] such that [[typeof(M, Gamma) = A]] if and only if [[Gamma |- M : A]] according to the type rules given in class. Define [[exception IllTyped of tlambda * ltype env]] and raise it if the term [[M]] is not well typed.
Exploit pattern matching as much as possible in your definition. Remember that ML does not have nonlinear patterns; for example, you cannot write a pattern [[(A, A)]] to match [[(3, 3)]] but not [[(3, 4)]]. Instead you have to write something like [[(A, A')]] and test for [[A = A']] explicitly.
Hints:

Proceed by structural induction on [[M]]. There will be only one possible rule to apply at each stage.
Remember that you follow rules from conclusions to premises.
You may find it easier to try to define a function [[hastype : tlambda * ltype * ltype env -> bool]] such that [[hastype(M, A, Gamma)]] is true if and only if [[Gamma |- M : A]] according to the type rules given in class. You will discover you can't do the application case, because the premises for the application rule mention a type A that appears nowhere in the conclusion. But making the attempt should tell you enough to write [[typeof]].
In your type checker, you will follow the type introduction and elimination rules from the bottom up, as discussed in class. But, if your code needs to build new environments or new types, you will follow the type and environment formation rules from the top down.
As far as proof obligations go, you may assume that the initial [[Gamma]] passed to [[typeof]] is well formed. It is your obligation to ensure that every new environment you might build is also well formed. Thus, if you wish to create new environments in your code, you must do so according to the typing rules for well formed environments.
I recommend that you implement a slightly different set of type rules than are in the notes: you should permit an inner binding of a variable name to hide an outer binding of the same name. (This change would complicate the formal notation, but it will simplify your code.)
[10pts] More types.
Extend your answer to problem 18 by adding either sum types or product types to the system. Be sure you also add all the appropriate terms, and extend your type checker.

Extra Credit. Making lambda notation. Take the uScheme interpreter from Chapter 5 of Kamin, Ramsey, and Cox, and add a primitive [[TeXify]] that takes an S-expression representing a Scheme program and spits out TeX for the lambda notation: <>= -> (TeXify '(lambda (x) (lambda (y) x))) ( \lambda x . \lambda y . x ) @ Your primitive should make sure that every abstraction and application have exactly one argument; if not, it can raise [[Type]] with a suitable error message. Make sure there are no redundant parentheses.

Hint: solve the problem in steps:

Translate the type [[value]] into your datatype for untyped lambda calculus.
Write function converting that to a list of strings. Here's where you'll figure out the parentheses---there aren't many cases.
Now translate the list of strings back to [[value]] and return it.

What to submit

Submit two files [[warmup.sml]] and [[types.sml]]. The exercises are all short, so comments in the source file will suffice; you need not create a separate README file. In comments at the top of your files, please include your name, the names of any collaborators, and the number of hours you spent on the assignment.

If you want to submit up to 3 test cases to be applied to other people's code, submit them in a file called [[tests]], complete with comments that tell us how to run them.