Project 1: Scanner (Lexer)

Due: Friday, September 23, 2016


In this project you will build the scanner for Appel's Tiger programming language. You will use the ML-Lex tool to generate your scanner from a high-level specification, which greatly reduces the amount of effort required. Your main job will be to determine regular expressions for the various lexical classes, write a tiger.lex file that specifies these regular expressions and the actions to perform for each one.

Scanner specification

Getting Started

Start by studying the lexical structure of the language as described in the specification document. For each kind of token figure out a regular expression that matches it. Download the skeleton code in this tar file. It contains the following file:

The lex file

All of your work will go into tiger.lex. The structure of the file follows a very old format inherited from the original lex tool for C. There have been many follow up tools for various languages, including flex, JLex, and ml-lex, which is the tool we'll use. You can find detailed documentation at the main ML-Lex page.

Your main job is to add regular expressions for each Tiger construct. With each regular expression you supply an action, which is typically to make and return a token. Some constructs require other actions as well. Look carefully at the language specification and decide what you need to do.

Add whatever other code and declarations you need. You might find it useful to look into explicit start states, which allow you to add your own named states into the DFA. A given regular expression can be qualified by an explicit state: it only matches in that state. You can transition explicitly between these states in the actions.

Building and testing

The code I'm giving you is designed so that you can build and test the lexer by itself (i.e., without the parser and the rest of the compiler). The driver just opens a file, and reads and prints each of the tokens it sees.

To build and test the lexer you run sml on the file, which specifies all the source files needed. The compilation manager knows to run ml-lex on the lexer file automatically, so there's not much to do. Here is an example run (key: black for prompts, blue for stdout, red for my typing):

sunfire63{sguyer}1211: cat test.tig
/* error : formals and actuals have 
  /*   foo bar */
  different types */
function g (a:int , b:string):int = a

g("one", "two")
sunfire63{sguyer}1211: sml
Standard ML of New Jersey v110.78 [built: Tue Jun  9 12:48:26 2015]
[parsing (]
[attempting to load plugin $/]
[library $/ is stable]
[library $smlnj/cm/ is stable]
[library $smlnj/internal/ is stable]
[library $SMLNJ-BASIS/ is stable]
[plugin $/ loaded successfully]
[attempting to load plugin $/]
[library $/ is stable]
[plugin $/ loaded successfully]
- Driver.parse "test.tig";
[autoloading done]
LET  @4
ID = g  @6
ID = a  @6
ID = int  @6
ID = b  @6
ID = string  @6
ID = int  @6
EQ  @6
ID = a  @6
IN  @7
ID = g  @10
STRING = "one"  @10
COMMA  @10
STRING = "two"  @10
END  @11
EOF  @0
val it = () : unit

If you have errors in your lexer specification you'll find out right when SML loads, as it will try to run ML-Lex as soon as it loads the cm file. If you have errors in support code, you might only see those errors when you actually run Driver.parse.


This project is relatively easy, but there are a few things to look out for:

You can find many test cases in Appel's directory of Tiger programs.

Submitting your work

Submit your scanner using provide. Please include a plain-text file describing (a) who you worked with, (b) how you handled comments, and (c) how you handled errors.

provide comp181 lexer tiger.lex readme.txt

Back to Comp181