CS 152 (Programming Languages): Supplement to Ullman

This document has a few fine points not covered in Ullman.

Opening structures
Getting record elements
Optional types
Vectors
Parentheses

Opening structures

Ullman sometimes abbreviates by opening structures, e.g., open TextIO. Never do this---it is bad enough to open structures in the standard basis, but if you open other structures, your code will be hopelessly difficulty to maintain. Instead, abbreviate structure names as needed. For example, after structure T = TextIO, you can use T.openIn, etc., without (much) danger of confusion.

Getting record elements

Some students have the idea that a good way to get the second element of a pair p is to write #2 p. This style is not idiomatic or readable. The proper way to handle this is by pattern matching, so

fun first  (x, _) = x
fun second (_, y) = y

is preferred, and not

fun bogus_first  p = #1 p
fun bogus_second p = #2 p

(For reasons I don't want to discuss, but will answer in class if asked, these versions don't even type-check.) If your pair or tuple is not an argument to a function, use val to do the pattern matching:

val (x, y) = lookup_pair mumble

But usually you can include matching in ordinary fun matching.

Points will be deducted on homework for using #1, #2, and their friends.

Optional types

Let's suppose you want to represent a value, except the value might not actually be known. For example, I could represent a grade on a homework by an integer, except if a grade hasn't been submitted. Or the contents of a square on a chessboard is a piece, except the square might be empty. This problem comes up so often that the initial basis for ML has a special type constructor called option, which lets you handle it. The definition of option is

datatype 'a option = NONE | SOME of 'a

and it is already defined when you start the interactive system. You need not and should not define it yourself.

Some examples

- datatype chesspiece = K | Q | R | N | B | P
- type square = chesspiece option
- val empty : square = NONE
- val lower_left : square = SOME R
- fun play piece = SOME piece : square;
> val play = fn : chesspiece -> chesspiece option

- SOME true; 
> val it = SOME true : bool option
- SOME 37;
> val it = SOME 37 : int option

- SOME "fish" = SOME "fowl";
> val it = false : bool
- SOME "fish" = NONE;
> val it = false : bool
- "fish" = NONE;
! Toplevel input:
! "fish" = NONE;
!          ^^^^
! Type clash: expression of type
!   'a option
! cannot be made to have type
!   string

The option type is covered in Ullman on pages 111-113, 208, etc.

Vectors

Although Ullman describes the mutable Array structure in Chapter 7, he doesn't cover the immutable Vector structure except for a couple of pages deep in Chapter 9. Like an array, a vector offers constant-time access to an array of elements, but a vector is not mutable. Because of its immutability, Vector is often preferred. It is especially flexible when initialized with Vector.tabulate. Here's the signature:

signature VECTOR =
  sig
    eqtype 'a vector
    val maxLen : int
    val fromList : 'a list -> 'a vector
    val tabulate : int * (int -> 'a) -> 'a vector
    val length : 'a vector -> int
    val sub : 'a vector * int -> 'a
    val extract : 'a vector * int * int option -> 'a vector
    val concat : 'a vector list -> 'a vector
    val app : ('a -> unit) -> 'a vector -> unit
    val foldl : ('a * 'b -> 'b) -> 'b -> 'a vector -> 'b
    val foldr : ('a * 'b -> 'b) -> 'b -> 'a vector -> 'b
    val appi : (int * 'a -> unit) -> 'a vector * int * int option -> unit
    val foldli : (int * 'a * 'b -> 'b)
                 -> 'b -> 'a vector * int * int option -> 'b
    val foldri : (int * 'a * 'b -> 'b)
                 -> 'b -> 'a vector * int * int option -> 'b
  end

It makes me deeply unhappy to have to warn you that the signature for Vector was changed in 2004, and that although the MLton compiler has tracked this change, and Standard ML of New Jersey has tracked parts of the change, Moscow ML has not tracked the change at all. For simplicity, you are best off sticking with Moscow ML and using MLton with the -basis 1997 option, but you need to know that these are not consistent with the current documentation on the web.

Parentheses

It's easy to be confused about when you need parentheses. Here's a checklist to tell you when you need parentheses around an expression or a pattern:

Is it an argument to a (possibly Curried) function, and if so, is it more than a single token?
Is it an infix expression that has to be parenthesized because the precedence of another infix operator would do the wrong thing otherwise?
Are you forming a tuple?
Are you parenthesizing an expression involving fn, case, or handle?

If the answer to any of these questions is yes, use parentheses. Otherwise, you almost certainly don't need them---so get rid of them!

Style

Ullman's style is less than ideal. Here are some short recommendations.

Never write x = nil. Either use null (that's why it's in the initial basis) or use pattern matching.
You need the semicolon at interactive toplevel, but it should almost never appear in your code. Don't use a semicolon unless you are deliberately sequencing imperative code. Ullman's book is full of unnecessary semicolons.
As per the checklist above, never parenthesize the condition in an if expression.