Standard ML Style Guide for COMP 105

(This style guide has been adapted from a style guide used at Cornell in courses long taught by Greg Morrisett and Andrew Myers.)

Programming-language people care about concise, precise, and elegant code. We take style serously. Below are some suggestions about what constitutes proper style for programs written in Standard ML.

Basic courtesy

Code must fix in 80 columns. No line of code may have more than 80 columns.

Code may not contain tab characters. The width of a tab varies with the machine, and what looks good on your machine may look terrible on mine. For indentation, use spaces.

Code must compile. Any code you submit must compile without errors or warnings. No excuses. Never submit anything that you have changed, no matter how small the change, without checking that it still compiles.

Comments

Minimize comments. These sorts of comments should be eliminated:
  • A comment that contains the same information as the code it refers to
  • A comment that provides a narrative description of a sequence of events in a computation, like an algorithm
  • A comment that provides information which could instead be conveyed by the name of a variable
  • A comment that provides information which could instead be conveyed by the type of a variable or function
  • A long comment in the middle of a definition
  • These sorts of comments are encouraged:
  • A comment that states a function's contract
  • A comment that explains what a let-bound or lambda-bound name represents
  • A comment which shows algebraic laws that a function obeys
  • If you wish to explain a design or refer to an external description of an algorithm or data structure, a long comment at the top of a block of code may be OK.

    Use blank lines carefully. Blank lines are useful primarily at top level, to separte distinct definitions or groups of related definitions.

    Unless function definitions within a let block are long, there should be no blank lines within a let block. There should never be a blank line within an expression or between two clauses that define the same function.

    Naming and Declarations

    Naming Conventions. The best way to tell at a glance something about the type of a variable is to use the standard SML naming conventions. The following are the preferred rules that are followed by the SML basis and SML/NJ libraries:

    Token SML Naming Convention Example
    Variables A variable is symbolic or begins with a small letter. Multiword names use embedded capital letters. getItem
    Constructors A constructor is symbolic or uses all capital letters. Multiword names use underscores. (Milner's contructors nil, true, and false are grandfathered.) Symbolic constructors are rare. NODE
    EMPTY_QUEUE
    Types A type is written with small letters. Multiword names use underscores. priority_queue
    Signatures A signature is written with all capital letters. Multiword names use underscores. PRIORITY_QUEUE
    Structures A structure begins with a capital letter. Multiword names use embedded capital letters. PriorityQueue
    Functors A functor uses the same naming convention as a structure. Its name may begin with Mk or end with Fn. PriorityQueueFn

    Sadly, these conventions are not enforced by the compiler.

    Use meaningful names. The name of a variable or function should say something about what the variable represents or what the function computes. In small scopes, short variable names are acceptable, as in C; here is an example:

    let
      val d = Date.fromTimeLocal(Time.now())
      val m = Date.minute d
      val s = Date.second d
      fun f n = (n mod 3) = 0
    in
      List.filter f [m,s]
    end
    Conventional names include f for a function, s for a string, n for an integer, x or y for a value of unknown type, a or b for a value of unknown type, xs or ys for a list, and so on.

    Definitional interpreters have their own conventions: x for a name, e for an expression, v for a value, d for a definition, tau for a type, rho for an environment, and many others.

    Indentation and code layout

    Indent by two or three spaces. Be consistent.

    Long expressions can be broken up and the parts aligned, as in the second example. Either is acceptable.

    val x = "Long line..."^
      "Another long line."
    
    val x = "Long line..."^
            "Another long line."

    Case expressions should be indented as follows:

    case expr
      of pat1 => ...
       | pat2 => ...
    If the expressions in the case are short enough that each arm can fit entirely on one line, it is sometimes good not to have a line break before the of:
    case expr of pat1 => ...
               | pat2 => ...

    If expressions can be indented in many acceptable ways:

    if exp1 then exp2              if exp1
    else if exp3 then exp4         then exp2
    else if exp5 then exp6         else exp3
         else exp8
    
    if exp1 then exp2 else exp3    if exp1 then exp2
                                   else exp3
    if exp1 then
      exp2
    else
      exp3
    

    Parentheses

    It's easy to be confused about when you need parentheses. Here's a checklist to tell you when you need parentheses around an expression or a pattern:
    1. Is it an argument to a (possibly Curried) function, and if so, is it more than a single token?
    2. Is it an infix expression that should be parenthesized because it is the argument of another infix operator?
    3. Are you forming a tuple?
    4. Are you parenthesizing an expression involving fn, case, or handle?
    If the answer to any of these questions is yes, use parentheses. Otherwise, you almost certainly don't need them---so get rid of them!

    Function application and infix operations. The space character should be used liberally as a separator.

  • Between a function and its argument
  • Between an infix operator and its arguments
  • Between elements of a tuple
  • 1 + max (height l, height r)  (* right *)
    
    1+max(height l,height r)      (* 4x WRONG *)
    

    Blocks or definitions. Blocks of definitions such as are found in let...in...end, struct...end, or sig...end should be indented as follows:

    fun foo bar =
      let val p = 4
          val q = 8
      in  bar * (p + q)
      end
    
    In rare cases it may be acceptable to introduce additional vertical space:
    fun foo bar =
      let (* this form should be rare *)
        val p = 4
        val q = 8
      in
        bar * (p + q)
      end
    

    Pattern matching and case analysis.

    Pattern matches must be complete. Our special version of Moscow ML will reject code with incomplete pattern matches. Other compilers will give warnings. Any file that includes an incomplete pattern match earns No Credit.

    For pattern matching and analysis, prefer fun. Do as much work as possible in the arguments of the fun definition form:

    (* GOOD *)
    fun f (x, y) (z, _) = ...
    
    (* BAD *)
    fun f arg1 arg2 =
      let val (x, y) = arg1
          val (z, w) = arg2  (* note unused 'w' *)
      in  ...
      end
    
    (* OUTRIGHT WRONG *)
    fun f arg1 arg2 =
      let val x = #1 arg1 (* may not typecheck *)
          val y = #2 arg1 (* may not typecheck *)
          val z = #1 arg2 (* may not typecheck *)
      in  ...
      end
    

    Prefer fun to case.

    (* GOOD *)
    fun length []      = 0
      | length (x::xs) = 1 + length xs
    
    (* BAD *)
    fun length l = 
      case l
        of []      => 0
         | x :: xs => 1 + length xs
    

    Avoid the projection operators. It is almost never correct for a beginner to use operations such as #1, #2, and so on. Instead use pattern matching. Auxiliary functions are OK:

    (* GOOD *)
    fun fst (x, _) = x
    fun snd (_, y) = y
    


    Combine nested case Expressions. Rather than nest case expressions, you can combine them by pattern matching against a tuple, provided the tests in the case expressions are independent. Here is an example:

    Bad
         let val d = Date.fromTimeLocal(Time.now())
         in  case Date.month d
               of Date.Jan => (case Date.day d
                                 of 1 => print "Happy New Year"
                                  | _ => ())
                | Date.Jul => (case Date.day d
                                 of 4 => print "Happy Independence Day"
                                  | _ => ())
                | Date.Oct => (case Date.day d
                                  of 10 => print "Happy Metric Day"
                                  | _ => ())
         end
    

    Good
         let val d = Date.fromTimeLocal(Time.now())
         in  case (Date.month d, Date.day d) 
               of (Date.Jan, 1)  => print "Happy New Year"
                | (Date.Jul, 4)  => print "Happy Independence Day"
                | (Date.Oct, 10) => print "Happy Metric Day"
                | _ => ()
         end

    Don't use valOf, hd, or tl. Use pattern matching.

    Handling long expressions

    Avoid breaking expressions over multiple lines. You can split a record or a tuple over multiple lines if you use "MacQueen notation": the opening and closing brackets, together with the separating commas, appear leftmost.

    (* HORRID *)
       fun mid   (x, y, z) = y
       fun right (x, y, z) = z
    
       fun euclid (m, n) : int * int * int =
         if n = 0 then (b 1, b 0, m)
         else (mid (euclid (n, m mod n)), u - (m div n) *
                 (euclid (n, m mod n)), right (euclid (n, m mod n)))
    
    (* STILL BAD, BUT THE TUPLE LOOKS GOOD *)
       fun mid   (x, y, z) = y
       fun right (x, y, z) = z
    
       fun euclid (m, n) : int * int * int =
         if n = 0 then (b 1, b 0, m)
         else ( mid   (euclid (n, m mod n))
              , left  (euclid (n, m mod n)) - (m div n) * mid (euclid (n, m mod n))
              , right (euclid (n, m mod n))
              )
           (* the layout of the tuple is OK, but not much else is right *)
    
    A better plan, especially for this example, is to use a let construct to name intermediate results:
    (* GOOD *)
         fun euclid (m, n) : int * int * int =
           if n = 0 then
             (b 1, b 0, m)
           else
             let val (q, r)    = (m div n, m mod n)  (* quotient/remainder *)
                 val (u, v, g) = euclid (n, r)
             in  (v, u - q * v, g)
             end
    

    Naming intermediate results has a cost.

    When intermediate results are short and simple, it's usually clearer not to name them

    Bad

         let val x = TextIO.inputLine TextIO.stdIn
         in  case x of ...
         end
    

    Good

         case TextIO.inputLine TextIO.stdIn of ...
    

    Bad (provided y is not a large expression):

    let val x = y*y in x+z end
    Good
    y*y + z

    Shorter is sweeter

    Don't duplicate the initial basis. Especially don't write your own versions of filter, map, and so on. When you can, use foldr.

    Shorter is sweeter. An if expression of type bool can often be rewritten:
    Bad Good
    if e then true else false e
    if e then false else true not e
    if beta then beta else false beta
    if x then true else y x orelse y
    if x then y else false x andalso y
    if x then false else y not x andalso y
    if x then y else true not x orelse y

    More short sweetness.
    Bad Good
    l::nil [l]
    l::[] [l]
    length + 0 length
    length * 1 length
    big * big let val x = big in x*x end
    if x then f a b c1
    else f a b c2
    f a b (if x then c1 else c2)
    String.compare(x,y)=EQUAL x=y
    String.compare(x,y)=LESS x<y
    String.compare(x,y)=GREATER x>y
    Int.compare(x,y)=EQUAL x=y
    Int.compare(x,y)=LESS x<y
    Int.compare(x,y)=GREATER x>y
    Int.sign(x)=~1 x<0
    Int.sign(x)=0 x=0
    Int.sign(x)=1 x>0


    Don't Rewrap Functions. When passing a function as an argument to another function, don't rewrap the function unnecessarily. Here's an example:

    List.map (fn x => Math.sqrt x) [1.0, 4.0, 9.0, 16.0]    (* WRONG *)
    
    List.map Math.sqrt [1.0, 4.0, 9.0, 16.0]   (* RIGHT *)
    

    The latter is better. Another case for rewrapping a function is often associated with infix binary operators. To prevent rewrapping the binary operator, use the op keyword as in the following example:

    foldl (fn (x,y) => x + y) 0    (* BAD *)
    
    foldl (op +) 0                 (* GOOD *)
    

    The latter is better.

    Replace nested let with a single let.

    (* BAD *)
    let val x = 42
    in
      let val y = x + 101
      in  x + y
      end
    end
    
    (* GOOD *)
    let val x = 42  
        val y = x + 101
    in  x + y
    end
    

    Common mistakes to avoid

    (This material comes from the course supplement to Ullman.)

    Avoid open; it's the devil's tool

    Ullman sometimes abbreviates by opening structures, e.g., open TextIO. Never do this---it is bad enough to open structures in the standard basis, but if you open other structures, your code will be hopelessly difficulty to maintain. Instead, abbreviate structure names as needed. For example, after structure T = TextIO, you can use T.openIn, etc., without (much) danger of confusion.

    Avoid #1 and #2

    Some beginners pick up the idea that a good way to get the second element of a pair p is to write #2 p. This style is not idiomatic or readable, and it can confuse the type checker. The proper way to handle pairs is by pattern matching, so
    fun first  (x, _) = x
    fun second (_, y) = y
    
    is preferred, and not
    fun bogus_first  p = #1 p
    fun bogus_second p = #2 p
    
    (For reasons I don't want to discuss, these versions don't even type-check.) If your pair or tuple is not an argument to a function, use val to do the pattern matching:
    val (x, y) = lookup_pair mumble
    
    But usually you can include matching in ordinary fun matching.

    Avoid polymorphic equality

    It is almost never correct to compare an expression with a constructor; use pattern matching instead. Or even better, if the type is predefined, you may be able to use a function from the initial basis
      xs = nil    (* WRONG *)
    
      (case xs of [] => true | _ :: _ => false)  (* BETTER *)
    
      null xs     (* BEST *)
    
    
    
      e = NONE (* WRONG *)
    
      (case e of NONE => true | SOME _ => false) (* BETTER *)
    
      not (isSome e)  (* BEST *)