Project 3: Semantic checking

Due: Friday, October 28


In this project you will implement the semantic checker for Tiger programs. The checker has three main jobs:

  1. Enter information into symbol tables (environments) to properly reflect the scoping rules.
  2. Check the types of all symbols and expressions.
  3. Implement various ad-hoc checks, such as making sure break occurs inside a loop.

The checker does not produce any output, but should emit meaningful messages for programs that have errors.

Getting started

Start by downloading check.sml, which will give you a bunch of starter code for the semantic checker. Add check.sml to your, and add a call to Check.checkProgram absyn to your driver (right after the file is closed).

The starter code should compile and be able to check very simple programs right off the bat. All of the places where you need to implement something are stubbed out with a raise NotImplemented expression.

All of the examples below are already in the starter code.

Project Description

The overall structure of the semantic checker is a set of mutually recursive functions that walk over the asbtract syntax tree, entering information into environments and checking uses of variables and expressions. There are some interesting details, which are discussed below.

Representation of types

The type checker introduces its own representation of types, which makes it easier to create and manipulate types:

type unique = unit ref
            | RECORDTY of (symbol * ty ) list * unique
            | ARRAYTY of ty * unique
            | NAMETY of symbol * ty option ref

One interesting thing to note is the use of the "unique" type -- its only purpose is to make each declaration of an array or record type unique regardless of whether or not they contain the same internal structure. In essence, it prohibits structural equality. It works because ref unit has no value other than a unique pointer.

The other thing to note is the ty option ref for the named types. Since types can be recursive, we need the ability to record named types before we know what they refer to. So, NAMETY("foo", ref(NONE)) is a type name foo with no definition yet, while NAMETY("foo", ref(SOME(t))) is a complete type.

Symbol tables (environments)

You will record type information in symbol tables or environments (the two terms mean the same thing, but you will find "symbol tables" in the old-school compiler community and "environments" in the functional programming community). I am providing you with a very simple implementation of symbol tables based on lists. The advantage of this implementation is that it is easy to support shadowing correctly (an identifier in a sub-scope that hides one with the same name in an outer scope). The interface is very simple:

type 'a symtab = (string * 'a) list
fun bind : string * 'a * 'a symtab -> 'a symtab
fun lookup : string * 'a symtab -> 'a option

Note that the lookup function returns an "option" value, which can be either SOME(v), if the symbol is found, or NONE if it isn't found.

Tiger has two name spaces: one for types and one for variables and functions. The type environment is just a mapping from symbols to types (instances of datatype ty). The variable/function environment needs more information, since we do not have an explicit representation of function types. The FUNENTRY contains the argument types and the result type:

datatype enventry = VARENTRY of ty
                  | FUNENTRY of ty list * ty

The type checker will carry around (and modify, when necessary) two environments:

val type_env : ty symtab
val var_env  : enventry symtab

Type-checking expressions

The main driving function in the checker is checkExp, which takes an expression, a variable environment, and a type environment, and computes the type of the expression:

val checkExp : AST.exp * enventry symtab * ty symtab -> ty

Since expressions are usually evaluated entirely in the same environments, it is often convenient to define a local function that performs the recursion, using the environments that are in the outer scope:

fun checkExp (exp, venv, tenv) =
  let fun check (AST.NIL) = NILTY
       |  check (AST.INT(value, pos, line)) = INTTY
       |  check ...etc...

The essence of type-checking expressions can be seen in the case for binary operations. We first compute the types of the subexpressions, then check to see if those types are compatible with the operation, and finally return the result type:

       |  check (AST.BINOP(e1, PLUS, e2)) =
                let val e1_ty = check e1
                    val e2_ty = check e2
                    if isIntTy e1_ty andalso isIntTy e2_ty then e1_ty
                       else raise SemanticError "Attempt to apply arithmetic op to non-integers"

The function isIntTy uses a simple case analysis to return true when the given type is INTTY.

Look carefully at Section 4 of the Tiger spec to determine the remaining cases. Some of the interesting ones:

One final issue: the way Tiger is defined, two types are equal when they are the same instance of a ty. There is no need for a complex eqType function, since we can just compare two types using =. However, type declarations of the form type a = b make "a" and "b" names for the same type. I recommend that you define a function called actualType that takes a type name and follows any chains of type declarations to the "real" type.

Type-checking declarations

Declarations introduce new types, variables, and functions into the environment. The only real trick in Tiger is that types and functions can be recursive, which requires the type checker to enter their names into the environment before checking their definitions.

The central function for processing declarations is enterDecls, which takes a list of declarations and the environments, and returns new environments with the declarations added. As with checkExp, it is convenient to define a local function to handle individual declarations. The essence of this function is either computing or looking up the declared type and then binding the symbol name to that type in the environment.

enterDecls (decllist, venv0, tenv0) =
    let fun enter (AST.TYDECL(sym, tyex), (venv, tenv)) =
                let val ty = transType (tyex, tenv)
                in  (venv, bindSym(sym, NAMETY(sym, ref(SOME(ty))), tenv))
         |  enter (AST.VARDECL(sym, SOME(tyname), exp), (venv, tenv)) =
                let val expty  = checkExp (exp, venv, tenv)
                    val declty = lookupSym(tyname, tenv)
                in  case declty of
                      SOME(ty) => if ty = expty then (bindSym(sym, VARENTRY(ty), venv), tenv)
                                  else raise SemanticError ((syminfo sym) ^ " Declared type does not match initializer")
                    | NONE     => raise SemanticError ((syminfo sym) ^ " Unknown type " ^ (nameof tyname))

IMPORTANT: the strategy in the starter code will not work for recursive types and recursive functions. You need to modify enterDecls to make two passes over the declarations: one to enter the names of types and functions into the environment, and a second pass to actually check the definitions. You can start by completing the simpler implementation given in the starter code, then modify it to handle recursion.

Translating types

The type checker has its own representation of types that is more suitable for type checking. You will need to complete the function called transType that takes an AST type expression and converts it to the checker's form.

Submitting your work

Submit your work the same way you have for the other two projects. The only code you need to submit is check.sml. In the description file please describe any interesting problems you ran into and how you solved them. Describe how you handled mutually recursive types and functions. Please submit any interesting test cases that you developed. Make sure you write some tests that you expect to fail the type checker!

provide comp181 checker check.sml description.txt

Back to Comp181