Project 1: Scanner (Lexer)

Due: Friday, September 23, 2016


Overview

In this project you will build the scanner for Appel's Tiger programming language. You will use the ML-Lex tool to generate your scanner from a high-level specification, which greatly reduces the amount of effort required. Your main job will be to determine regular expressions for the various lexical classes, write a tiger.lex file that specifies these regular expressions and the actions to perform for each one.

Scanner specification

Getting Started

Start by studying the lexical structure of the language as described in the specification document. For each kind of token figure out a regular expression that matches it. Download the skeleton code in this tar file. It contains the following file:

The lex file

All of your work will go into tiger.lex. The structure of the file follows a very old format inherited from the original lex tool for C. There have been many follow up tools for various languages, including flex, JLex, and ml-lex, which is the tool we'll use. You can find detailed documentation at the main ML-Lex page.

Your main job is to add regular expressions for each Tiger construct. With each regular expression you supply an action, which is typically to make and return a token. Some constructs require other actions as well. Look carefully at the language specification and decide what you need to do.

Add whatever other code and declarations you need. You might find it useful to look into explicit start states, which allow you to add your own named states into the DFA. A given regular expression can be qualified by an explicit state: it only matches in that state. You can transition explicitly between these states in the actions.

Building and testing

The code I'm giving you is designed so that you can build and test the lexer by itself (i.e., without the parser and the rest of the compiler). The driver just opens a file, and reads and prints each of the tokens it sees.

To build and test the lexer you run sml on the sources.cm file, which specifies all the source files needed. The compilation manager knows to run ml-lex on the lexer file automatically, so there's not much to do. Here is an example run (key: black for prompts, blue for stdout, red for my typing):

sunfire63{sguyer}1211: cat test.tig
/* error : formals and actuals have 
  /*   foo bar */
  different types */
let
/*/**/*/
function g (a:int , b:string):int = a
in
/**/

g("one", "two")
end
sunfire63{sguyer}1211: sml sources.cm
Standard ML of New Jersey v110.78 [built: Tue Jun  9 12:48:26 2015]
[scanning sources.cm]
[parsing (sources.cm):tokens.sml]
[attempting to load plugin $/lex-ext.cm]
[library $/lex-ext.cm is stable]
[library $smlnj/cm/tools.cm is stable]
[library $smlnj/internal/cm-lib.cm is stable]
[library $SMLNJ-BASIS/basis.cm is stable]
[plugin $/lex-ext.cm loaded successfully]
[attempting to load plugin $/mllex-tool.cm]
[library $/mllex-tool.cm is stable]
[plugin $/mllex-tool.cm loaded successfully]
- Driver.parse "test.tig";
[autoloading]
[autoloading done]
LET  @4
FUNCTION  @6
ID = g  @6
LPAREN  @6
ID = a  @6
COLON  @6
ID = int  @6
COMMA  @6
ID = b  @6
COLON  @6
ID = string  @6
RPAREN  @6
COLON  @6
ID = int  @6
EQ  @6
ID = a  @6
IN  @7
ID = g  @10
LPAREN  @10
STRING = "one"  @10
COMMA  @10
STRING = "two"  @10
RPAREN  @10
END  @11
EOF  @0
val it = () : unit
-

If you have errors in your lexer specification you'll find out right when SML loads, as it will try to run ML-Lex as soon as it loads the cm file. If you have errors in support code, you might only see those errors when you actually run Driver.parse.

Requirements

This project is relatively easy, but there are a few things to look out for:

You can find many test cases in Appel's directory of Tiger programs.

Submitting your work

Submit your scanner using provide. Please include a plain-text file describing (a) who you worked with, (b) how you handled comments, and (c) how you handled errors.

provide comp181 lexer tiger.lex readme.txt


Back to Comp181