In this project you will build the parser for Appel's Tiger programming language. You will use the ML-Yacc tool to generate an LALR parser automatically from a grammar specification. Your main job is to design a grammar and associated semantic actions needed to build an abstract syntax tree during parsing.
Start by studying the grammatical structure of the language as described in thespecification document. Download the skeleton code in this tar file. It contains the following files:
tiger.grm: This is the main grammar file -- it is the only file you should need to modify.
driver.sml: The parser test driver -- note that it is different from the driver in project 1.
ast.sml: The abstract syntax tree structure for Tiger
sources.cm: The compilation manager file (kind of like a Makefile).
NOTE:You will need to make a few modifications to your lexer so that
it can be hooked up to the parser. Start by uncommenting the "glue code" section
at the start of the file. Next, remove the old definition of
lexresult type. Finally, uncomment the
declaration (later in the file). I noticed an error in where I placed this
comment -- please move the
%header declaration after the %%
(it should come immediately before the
%arg declaration). Sorry for
If for some reason you were not able to get the lexer working, I will post a solution that you can download.
The specification of the Tiger language is given in a high-level BNF. This
description is pretty typical of language specifications -- the specification for C
given at the end of the classic Kernighan and Ritchie book is similar. Read it
carefully and design a precise grammar for the language. Enter the grammar in
tiger.grm file, which serves as input to the ML-Yacc parser
Start out by leaving the semantic actions for each production empty and focus on getting the grammar right. Once the grammar works on a variety of test cases then start adding ML code to build the AST. You can find many test cases in Appel's directory of Tiger programs.
Use the same error-reporting machinery from the lexer assignment. Make sure your parser properly reports the locations of syntax errors.
Don't be surprised if you need to try several different grammars before you settle on one that works. There are some tricky parts!
You should be able to build the system by typing
on the command line. Test the parser using the driver
Driver.parse "filename.tig". As with the lexer, you may
find that some errors don't show up until you actually try to run the
NOTE: The grammar specification has a separate token class for identifiers and type names. In practice, these are the same kind of token in the lexer. The distinction will be important when we perform semantic checking on the abstract syntax tree.
Your parser should build a representation of the program using the following datatypes (from ast.sml):
type symbol = string * pos * line * int ref datatype exp = NIL | INT of int * pos * line | STRING of string * pos * line | VAR of symbol | DOT of exp * symbol | SUBSCRIPT of exp * exp | SEQ of exp list | NEG of exp | CALL of symbol * exp list | BINOP of exp * oper * exp | MAKEARR of symbol * exp * exp | MAKEREC of symbol * ( symbol * exp ) list | ASSIGN of exp * exp | IF of exp * exp * exp option | WHILE of exp * exp | FOR of symbol * exp * exp * exp | LET of decl list * exp list | BREAK and decl = TYDECL of symbol * tyex | VARDECL of symbol * symbol option * exp | FUNDECL of symbol * decl list * symbol option * exp | NODECL and tyex = TYID of symbol | ARRAY of symbol | RECORD of decl list and oper = PLUS | MINUS | TIMES | DIV | EQ | NEQ | LT | LE | GT | GE | AND | OR
The mapping from grammar constructs to AST nodes should be fairly straightforward, with a few exceptions. First, I eliminated the distinction between "expressions" and "lvalues". They are all expressions. Since the parser enforces the rule that the left-hand side of an assignment must be an assignment, there is no need represent it here. It will make our lives easier later (trust me!).
Second, identifiers are represented by the type
symbol, which is
just a name (and position) with an
int ref added. For now, the
integer reference plays no role -- we just pass
ref 0 to the
constructor. We will use this reference during semantic checking to resolve
I am still working on a pretty-printer for the AST, but I'll send you that code as soon as it's ready.
Submit your parser using
provide. Please include a plain-text
file describing (a) how you handled shift-reduce conflicts, (b) anyone you
dicussed the problem with, and (c) what parts were the most challenging.
provide comp181 parser tiger.grm description.txt
Back to Comp181