Project 2: Parser

Due: Tuesday, October 11, 2016


In this project you will build the parser for Appel's Tiger programming language. You will use the ML-Yacc tool to generate an LALR parser automatically from a grammar specification. Your main job is to design a grammar and associated semantic actions needed to build an abstract syntax tree during parsing.

Parser specification

Getting Started

Start by studying the grammatical structure of the language as described in thespecification document. Download the skeleton code in this tar file. It contains the following files:

NOTE:You will need to make a few modifications to your lexer so that it can be hooked up to the parser. Start by uncommenting the "glue code" section at the start of the file. Next, remove the old definition of the lexresult type. Finally, uncomment the %header declaration (later in the file). I noticed an error in where I placed this comment -- please move the %header declaration after the %% (it should come immediately before the %arg declaration). Sorry for the confusion!

If for some reason you were not able to get the lexer working, I will post a solution that you can download.

The grm file

The specification of the Tiger language is given in a high-level BNF. This description is pretty typical of language specifications -- the specification for C given at the end of the classic Kernighan and Ritchie book is similar. Read it carefully and design a precise grammar for the language. Enter the grammar in the tiger.grm file, which serves as input to the ML-Yacc parser generator.

Start out by leaving the semantic actions for each production empty and focus on getting the grammar right. Once the grammar works on a variety of test cases then start adding ML code to build the AST. You can find many test cases in Appel's directory of Tiger programs.

Use the same error-reporting machinery from the lexer assignment. Make sure your parser properly reports the locations of syntax errors.

Don't be surprised if you need to try several different grammars before you settle on one that works. There are some tricky parts!

You should be able to build the system by typing sml" on the command line. Test the parser using the driver function, Driver.parse "filename.tig". As with the lexer, you may find that some errors don't show up until you actually try to run the parser.

NOTE: The grammar specification has a separate token class for identifiers and type names. In practice, these are the same kind of token in the lexer. The distinction will be important when we perform semantic checking on the abstract syntax tree.

Abstract syntax tree

Your parser should build a representation of the program using the following datatypes (from ast.sml):

type symbol = string * pos * line * int ref

datatype exp = NIL
             | INT of int * pos * line
             | STRING of string * pos * line
             | VAR of symbol 
             | DOT of exp * symbol
             | SUBSCRIPT of exp * exp
             | SEQ of exp list
             | NEG of exp
             | CALL of symbol * exp list
             | BINOP of exp * oper * exp
             | MAKEARR of symbol * exp * exp
             | MAKEREC of symbol * ( symbol * exp ) list
             | ASSIGN of exp * exp
             | IF of exp * exp * exp option
             | WHILE of exp * exp
             | FOR of symbol * exp * exp * exp
             | LET of decl list * exp list
             | BREAK

and decl = TYDECL  of symbol * tyex
         | VARDECL of symbol * symbol option * exp
         | FUNDECL of symbol * decl list * symbol option * exp
         | NODECL

and tyex = TYID of symbol
         | ARRAY of symbol
         | RECORD of decl list

and oper = PLUS | MINUS | TIMES | DIV | EQ | NEQ | LT | LE | GT | GE | AND | OR

The mapping from grammar constructs to AST nodes should be fairly straightforward, with a few exceptions. First, I eliminated the distinction between "expressions" and "lvalues". They are all expressions. Since the parser enforces the rule that the left-hand side of an assignment must be an assignment, there is no need represent it here. It will make our lives easier later (trust me!).

Second, identifiers are represented by the type symbol, which is just a name (and position) with an int ref added. For now, the integer reference plays no role -- we just pass ref 0 to the constructor. We will use this reference during semantic checking to resolve scopes.

I am still working on a pretty-printer for the AST, but I'll send you that code as soon as it's ready.

Submitting your work

Submit your parser using provide. Please include a plain-text file describing (a) how you handled shift-reduce conflicts, (b) anyone you dicussed the problem with, and (c) what parts were the most challenging.

provide comp181 parser tiger.grm description.txt

Back to Comp181