Tiger Programming Language

This semester we will implement a compiler for Andrew Appel's Tiger programming language. This language is the source language is his series of books, "Modern Compiler Implementation in ...". We will not be following the book, though. We are just using the syntax and semantics of the language as a guide.

Language Syntax

Lexical elements

An identifier is a sequence of letters, digits, and underscores that starts with a letter. Case is significant.

Whitespace (spaces, tabs, newlines, returns, and formfeeds) may appear anywhere between tokens and is ignored.

A comment starts with /* and ends with */ (like C). Comments can be nested.

An integer literal is a sequence of one or more decimal digits. There are no negative integer literals -- negative numbers are specified by applying unary minus to a positive literal.

A string literal is a sequence of zero or more printable characters, spaces, or escape sequences surrounded by double quotes. Each escape sequence starts with a backslash and stands for some sequence of special characters:

\"Double quote
\dddThe character with ASCII code ddd (three decimal digits)
\...\Any sequence of whitespace characters enclosed in backslashes is ignored. This allows string literals to span multiple lines by ending on one line and restarting on the following line with backslashes

Reserved words in the language are: array break do else end for function if in let nil of then to type var while.

Delimiters and operators consist of the following one or two-character sequences: ( ) [ ] { } : ; , . + - * / = <> < <= > >= & | :=

Grammatical syntax

A Tiger program consists of a single expression. The let construct allows a program to include multiple functions that call eachother.

In the grammar below, we use the following conventions. We use colors/fonts for non-terminals, token classes (like identifiers), and keywords and delimiters. The BNF operators are , |, *, +, and [ ] (to denote optional parts). The * operator has a special notation that can include a delimiter character. For example, exp*; denotes zero or more expressions separated by commas.

Overall program structure:


Kinds of expresions:

sequence( exp *; )
negation- exp
funcallid ( exp *, )
infixexp infixop exp
arrCreatetypid [ exp ] of exp
recCreatetypid { field *, }
fieldid = exp
assignlvalue := exp
ifthenelseif exp then exp [ else exp ]
whilewhile exp do exp
forfor id := exp to exp do exp
letlet dec+ in exp *; end
subscriptlvalue [ exp ]
fieldexplvalue . id

Declarations and types:

tydectype typid = ty
arrtyarray of typid
recty{ fielddec *, }
fieldecid : typid
vardecvar id [ : typid ] := exp
fundecfunction id ( fielddec *, ) [: typid ] = exp