Overview

In this assignment you will implement a constraint-based inference algorithm for the Hindley-Milner type system, which represents the current best practice for flexible static typing. The assignment has two purposes:

Setup

Clone the book code:

git clone linux.cs.tufts.edu:/comp/105/build-prove-compare

The code you need is in bare/nml/ml.sml.

Dire warning

Except possibly as an argument to map (which we recommend against), none of the code you write may use fst or snd.1 You may not define and use a helper function with the same contract as fst or snd. Submissions violating this rule will earn No Credit.

What are you supposed to do? Pattern match:

val (left, right) = ... expression that evaluates to a pair ...

Reading comprehension (10%)

These problems will help guide you through the reading. We recommend that you complete them before starting the other problems below. You can download the questions.

  1. Read sections 7.3.2 and 7.4.1, which start on pages page 480 and page 481, respectively.
    We have seen the symbols ρ, τ, and σ before, but not used exactly in this way.

    Here is a list of semantic and type-related concepts you have seen written using single symbols:

    • an expression
    • a name
    • a location
    • a value
    • a type
    • a type scheme (new in this chapter)
    • a mapping from names to locations
    • a mapping from names to values
    • a mapping from names to types
    • a mapping from names to type schemes (new in this chapter)

    There are lots of concepts and only so many symbols to go around. Please identify, from the preceding list, what each symbol stands for in the theory of nano-ML:

    1. ρ

    2. τ

    3. σ

    4. Γ

    And finally,

    1. Say briefly what, in nano-ML, is the difference between τ and σ:

    You are preparing for exercise 19.

  2. Read the first two pages of section 7.4.3, which explain “substitutions” and “instances.”

    1. Yes or no: does the substitution (α → sym) ∘ (β → bool) ∘ (γ → int) replace type variable α with type sym?

    2. Yes or no: does the substitution (α → sym) ∘ (β → bool) ∘ (γ → int) replace type variable β with type bool?

    3. Yes or no: does the substitution (α → sym) ∘ (β → bool) ∘ (γ → int) leave the type γ list unchanged?

    4. Which of the following are instances of the polymorphic type scheme ∀α . α list → int? For each one, please indicate whether it is an instance of the type scheme (True) or whether it is not an instance of the type scheme (False).

      int list True or False

      int list list True or False

      int list list → int True or False

      int * int list → list True or False

    You have a foundation on which to get ready for exercises 18 and S.

  3. Read the first page of section 7.5.2, which shows the form of a constraint. Then skip to the first page of section 7.5.3, which explains how to apply a substitution to a constraint.

    We start with a substitution θ and a constraint C:

    θ = (α₁ ↦ int)

    C = α₁ ~ α₂ /\ α₂ ~ α₃ list /\ α₄ ~ α₃ list list.

    Now define C = θ(C).

    1. Write C:

    2. Does C have a solution? Answer yes or no.

    Now define C as the result of applying substitution (α₂ ↦ int) to C.

    1. Write C:

    2. Does C have a solution? Answer yes or no.

    You are getting ready for exercises 18 and S.

  4. Now read all of section 7.5.3, which explains how to solve constraints.

    To demonstrate your understanding, reason about solving these four constraints:

    C₁ = α ~ int

    C₂ = α ~ bool

    C₃ = C₁ /\ C₂

    C₄ = α₁ ~ α₂ /\ α₂ list ~ α₁

    1. Write a substitution θ₁ that solves constraint C₁:

    2. Write a substitution θ₂ that solves constraint C₂:

    3. Does the composition θ₂ ∘ θ₁ solve constraint C₃ = C₁ /\ C₂? Answer yes or no.

    4. Can constraint C₃ be solved? Answer yes or no.

    5. Can constraint C₄ be solved? Answer yes or no.

    You are ready for excercises 18 and S.

  5. Read the paragraphs that describe the nondeterministic typing rules for lambda and for “Milner’s Let”, which you will find on page 489. Especially, read the small paragraph following the lambda rule.

    Now look at the val definition of too-poly in code chunk 489. The right-hand side of the val definition is a lambda expression with the name empty-list playing the role of x₁.

    1. The rule for lambda says that we can pick any type τ₁ for empty-list. After we’ve chosen τ₁, what is the type scheme to which empty-list (playing x₁) is bound in the extended environment which is used to check e? (Hint: this type scheme comes from the lambda rule, as per the discussion in the small paragraph, and it is different from the type scheme of the empty-list that appears in the top-level val binding.)

    2. Given that the rule for lambda says that we can pick any type τ₁ for empty-list, why can’t we pick a τ₁ that makes the lambda expression type-check? Put the word YES next to the best explanation:

      • Parameter empty-list has to have type (forall ('a) (list 'a)), but τ₁ is not a forall type.

      • Parameter empty-list has type τ₁ = (list 'a), which is not the same as (list bool).

      • Parameter empty-list can have any type τ₁ but no τ₁ can be equivalent to both (list int) and (list bool).

      • Parameter empty-list has type τ₁ = (list bool), which is not the same as (list int).

      • Parameter empty-list has type τ₁ = (list int), which is not the same as (list bool).

  6. Now look at the definition of not-too-poly in code chunk 490. The right-hand side is an example of Milner’s let with empty-list playing the role of x, the literal '() playing the role of e, and an application of pair playing the role of e. Suppose that Γ ⊢ '() : β list, where β is a type variable that does not appear anywhere in Γ. That is to say, the literal '() is given the type β list, which is playing the role of τ’.

    1. If τ’ is β list, what are its free type variables?

    2. What set plays the role of { α₁, ..., αₙ }, which is ftv(τ’)-ftv(Γ)?

    3. What is the type scheme to which empty-list (playing x) is bound in the extended environment which is used to check e?

Exercises to do on your own (10%)

On your own, please work exercise 1 on page 530 and exercise 2 on page 530 of Build, Prove, and Compare. These exercises explore some implications of type inference.

1. Exploring the meaning of polymorphic types I. Do exercise 1 on page 530 of Build, Prove, and Compare.

Related reading: If you need to review quantified types, look at section 6.6.3.

2. Exploring the meaning of polymorphic types II. Do exercise 2 on page 530 of Build, Prove, and Compare.

Related reading: To familiarize yourself with the type system, read section 7.4.1. The rules for evaluating definitions are explained in section 7.3.2.

Exercises you may do with a partner (80%)

Either on your own or with a partner, please work Exercises 3, 18, 19, and 20 from pages 530 to 534 of Build, Prove, and Compare, and the two exercises S and T below.

3. Algorithmic rules for Begin and Lambda. Do exercise 3 on page 530 of Build, Prove, and Compare. This exercise fills in a key step between the nondeterministic rules in the book and the deterministic rules you will need to implement type inference.

Hints:

Related reading: The first part of section 7.5.2, which starts on page 496, up to and including the part labeled “Converting nondeterministic rules to use constraints.”

18. Implementing a constraint solver. Do exercise 18 on page 533 of Build, Prove, and Compare. This exercise is probably the most difficult part of the assignment. Before proceeding with type inference, make sure your solver produces the correct result on our test cases and on your test cases. You may also want to show your solver code to the course staff.

Related reading:

  • Section 7.4.1, which starts on page 481. It will familiarize you with the type system.

  • The second bullet in the opening of section 7.5, which introduces constraints.

  • The opening of section 7.5.2, which starts on page 498. This section explains constraints and shows them in the typing rules. If you understand the constraint-based IF rule, in both its simple form and its TypesOf form, you can stop there.

  • The explanation of constraint solving in section 7.5.3, which starts on page 507.

  • The table showing the correspondence between nano-ML’s type sytem and code on page 510.

  • The definition of con and the utility functions in section 7.6.4, which starts on page 514.

  • The definition of function solves on page 516, which you can use to verify solutions your solver claims to find.

S. Test cases for the solver. In file solver-tests.sml, write three test cases for the constraint solver. At least two of these test cases should be constraints that have no solution. Assuming that we provide a function constraintTest : con -> answer, write your test cases as three successive calls to constraintTest. Do not define constraintTest yourself.

Here is a sample set of test cases:

val _ = constraintTest (TYVAR "a" ~ TYVAR "b" /\ TYVAR "b" ~ TYCON "bool")
val _ = constraintTest (CONAPP (TYCON "list", [TYVAR "a"]) ~ TYCON "int")
val _ = constraintTest (TYCON "bool" ~ TYCON "int")

Naturally, you will supply your own test cases, different from these.

You can typecheck your file on the Unix command line by running

105-check-constraints solver-tests.sml

19. Implementing type inference. Do exercise 19 on page 534 of Build, Prove, and Compare.

Related reading:

  • The nondeterministic typing rules of nano-ML, which start on page 488 of Build, Prove, and Compare.

  • The constraint-based typing rules in section 7.5.2

  • The summaries of the typing rules from page 537 to page 538

  • Explanation and examples of check-type and check-principal-type in section 7.4.6, which starts on page 491

T. Test cases for type inference. Create a file type-tests.nml, and in that file, write three unit tests for nano-ML type inference. At least two of these tests must use check-type-error. The third may use either check-type-error or check-principal-type. If you wish, your file may include val bindings or val-rec bindings of names used in the tests. Your file must load and pass all tests using the reference implementation of nano-ML:

  nml -q < type-tests.nml

If you submit more than three tests, we will use only the first three.

Here is a complete example type-tests.nml file:

(check-type-error (lambda (x y z) (cons x y z)))
(check-type-error (+ 1 #t))
(check-type-error (lambda (x) (cons x x)))

You will supply your own test cases, different from these.

Related reading:

  • Concrete syntax for types and for unit tests, in Figure 7.1 on 476

  • As above, the explanation and examples of check-type and check-principal-type in section 7.4.6, which starts on page 491.

20. Adding primitives. Do exercise 20 on page 534 of Build, Prove, and Compare.

Related reading: Read about primitives in section 7.6.7.

Extra Credit

For extra credit, you may complete any of the following:

Of these exercises, the most interesting are probably Mutation (easy) and Explicit types (not easy).

What and how to submit: Individual problems

Submit these files:

As soon as you have the files listed above, run submit105-ml-inf-solo to submit a preliminary version of your work. Keep submitting until your work is complete; we grade only the last submission.

What and how to submit: Pair problems

Submit these files:

As soon as you have the files listed above, run submit105-ml-inf-pair to submit a preliminary version of your work. Keep submitting until your work is complete; we grade only the last submission.

Hints and guidelines

Building the standalone interpreter

If you call your interpreter ml.sml, you can build a standalone version in a.out by running mosmlc ml.sml or a faster version in ml by running mlton -output a.out ml.sml. Than you can run it with ./a.out and you can run tests by

./a.out -q < type-tests.nml         # required
./a.out -q < more-type-tests.nml    # optional

The constraint solver

To help you with the solver, once you have implemented solve, the following code redefines solve into a version that checks itself for sanity (ie, idempotence). It is a good idea to check that the substitution returned by your solver is idempotent before using it in your type inferencer.

fun isIdempotent pairs =
    let fun distinct a' (a, tau) = a <> a' andalso not (member a' (freetyvars tau))
        fun good (prev', (a, tau)::next) =
              List.all (distinct a) prev' andalso List.all (distinct a) next
              andalso good ((a, tau)::prev', next)
          | good (_, []) = true
    in  good ([], pairs)
    end

val solve =
    fn c => let val theta = solve c
            in  if isIdempotent theta then theta
                else raise BugInTypeInference "non-standard substitution"
            end

Type inference

With your solver in place, type inference should be mostly straightforward.

Follow the same step-by-step procedure you used to build your type checker for Typed μScheme. In particular,

It pays to create a lot of regression tests, of both the check-principal-type and the check-type-error variety. (The check-type test also has its place, but for this assignment, you want to stick to check-principal-type.) The most effective tests of your algorithm will use check-type-error. Assigning types to well-typed terms is good, but most mistakes are made in code that should reject an ill-typed term, but doesn’t. Here are some examples of the sorts of tests that are really useful:

(check-type-error (lambda (x) (cons x x)))
(check-type-error (lambda (x) (cdr (pair x x))))

Once your interpreter is rejecting ill-typed terms, if it can process the predefined functions and infer their principal types correctly, you are doing well. As a larger integration test, I have posted a functional topological sort. It contains some type tests as well as a check-expect.

Avoid common mistakes

A common mistake is to create too many fresh variables or to fail to constrain your fresh variables.

Another surprisingly common mistake is to include redundant cases in the code for inferring the type of a list literal. As is almost always true of functions that consume lists, it’s sufficient to write one case for NIL and one case for PAIR.

It’s a common mistake to define a new exception and not handle it. If you define any new exceptions, make sure they are handled. It’s not acceptable for your interpreter to crash with an unhandled exception just because some nano-ML code didn’t type-check.

It’s not actually a common mistake, but don’t try to handle the exception BugInTypeInference. If this exception is raised, your interpreter is supposed to crash.

It’s a common mistake to disable the predefined functions for testing and then to submit your interpreter without re-enabling the predefined functions. Ouch!

It’s a common mistake to call ListPair.foldr and ListPair.foldl when what you really meant was ListPair.foldrEq or ListPair.foldlEq.

It is a mistake to assume that an element of a literal list always has a monomorphic type.

It is a mistake to assume that begin is never empty.

How your work will be evaluated

Your constraint solving and type inference will be evaluated through extensive testing. We must be able to compile your solution in Moscow ML by typing, e.g.,

mosmlc ml.sml

If there are errors or warnings in this step, your work will earn No Credit for functional correctness.

We will focus the rest of our evaluation on your constraint solving and type inference.

Names

We expect you to pay attention to names:
Exemplary Satisfactory Must improve
Names

• Type variables have names beginning with a; types have names beginning with t or tau; constraints have names beginning with c; substitutions have names beginning with theta; lists of things have names that begin conventionally and end in s.

• Types, type variables, constraints, and substitutions mostly respect conventions, but there are some names like x or l that aren’t part of the typical convention.

• Some names misuse standard conventions; for example, in some places, a type variable might have a name beginning with t, leading a careless reader to confuse it with a type.

Code structure

We expect you to pay even more attention to structure. Keep the number of cases to a minimum!
Exemplary Satisfactory Must improve
Structure

• The nine cases of simple type equality are handled by these five patterns: TYVAR/any, any/TYVAR, CONAPP/CONAPP, TYCON/TYCON, other.

• The code for solving α ∼ τ has exactly three cases.

• The constraint solver is implemented using an appropriate set of helper functions, each of which has a good name and a clear contract.

• Type inference for list literals has no redundant case analysis.

• Type inference for expressions has no redundant case analysis.

• In the code for type inference, course staff see how each part of the code is necessary to implement the algorithm correctly.

• Wherever possible appropriate, submission uses map, filter, foldr, and exists, either from List or from ListPair

• The nine cases are handled by nine patterns: one for each pair of value constructors for ty

• The code for α ∼ τ has more than three cases, but the nontrivial cases all look different.

• The constraint solver is implemented using too many helper functions, but each one has a good name and a clear contract.

• The constraint solver is implemented using too few helper functions, and the course staff has some trouble understanding the solver.

• Type inference for list literals has one redundant case analysis.

• Type inference for expressions has one redundant case analysis.

• In some parts of the code for type inference, course staff see some code that they believe is more complex than is required by the typing rules.

• Submission sometimes uses a fold where map, filter, or exists could be used.

• The case analysis for a simple type equality does not have either of the two structures on the left.

• The code for α ∼ τ has more than three cases, and different nontrivial cases share duplicate or near-duplicate code.

• Course staff cannot identify the role of helper functions; course staff can’t identify contracts and can’t infer contracts from names.

• Type inference for list literals has more than one redundant case analysis.

• Type inference for expressions has more than one redundant case analysis.

• Course staff believe that the code is significantly more complex than what is required to implement the typing rules.

• Submission includes one or more recursive functions that could have been written without recursion by using map, filter, List.exists, or a ListPair function.


  1. These functions are defined and used in the interpreter only to be passed to higher-order functions. They are never called directly.