Comp11
Grammar for C++

Overview

In class today we developed a grammar for the parts of the C++ programming language that we've talked about so far. The grammar gives us a clear way to describe what is allowed in the language (and what is not allowed.)

The way the grammar works is that it breaks a program down into syntactic categories, which describe what composes different parts of a program. We can use the same idea to describe a grammar for English: for example, a sentence is a syntactic category composed of other (smaller) categories, such as nouns, verbs, and adjectives. The grammar describes what is in these categories, and how to construct the larger structures from the smaller ones.

Another way you can use the grammar is to "diagram" a program -- just like diagramming a sentence in elementary school. You should be able to circle each part of your program and identify the syntactic category -- it is a statement? an expression? a declaration? a definition? -- and explain why it belongs where it is in the program.

The notation for grammar rules uses several special operators (colored in blue):

:= defines a rule and can be read as "is made up of".

| (vertical bar) separates choices in a rule.

[ ] (square brackets) enclose parts of a rule that are optional.

{ } (curly braces) enclose parts of a rule that may be repeated zero or more times.

Note that this notation is not part of C++; it is a notation for defining grammars (of any kind) called Backus-Naur form. In the grammar below, the parts in brown are grammatical categories (like noun or verb in English) and the parts in black are called literals -- things you can actually write (like dog or run in English).

Program

The top-most grammatical rule defines what composes a program, at the highest level. Our programs consist of a #include, followed by zero or more struct definitions, zero or more function declarations, and zero or more function definitions. Here is how we write that in the grammar:

program := #include "comp11io.h"
{ structdef }
{ functiondecl }
{ functiondef }

Structure definitions

New: struct definitions introduce new types, so we generally place them at the top of the C++ source file. The "body" of the struct consists of a set of fields or members each of which has a type and a name. The name that follows the struct keyword will be the name of the new type.

structudef := struct name {
    { type name ; }
}

Function declarations and definitions

Both function declarations and function definitions have a head, so we can make a separate rule for that and use it twice. Notice the parameters rule: it says "the parameters are made up of one parameter, followed by zero or more occurences of a comma followed by another parameter." (A name is any word starting with a letter followed by some number of letters and digits.)

head := type name( [ parameters ] )
parameters := parameter { , parameter }
parameter := type name
type :=   int
| double
| bool
| char
| string
| name          // Note: must be a struct type name

New: Notice that since we can define new types using structdef, the rule for types now includes the option for a name, which would have to be the name of a struct type.

Compare some examples of what is allowed and not allowed by these rules:

 
// NOT allowed -- missing types
bool isearlier(m1, d1, m2, d2);
 
// NOT allowed -- missing return type
isearlier(int m1, int d1, int m2, int d2);
 

Now we can define function declaration and definition. A function declaration is just the head followed by a semicolon. The definition includes the body.

functiondecl := head ;
functiondef := head
{
    body
}

Function body

The body consists of some mix of variable definitions and statements. Notice the way I've define this rule: you can choose a variabldef or a statement, and that whole thing has a "star", allowing you to mix the two kinds of constructs.

body := { variabledef  | statement }

Compare that rule to an alternative rule that forces you to have all you variable declarations first, then all your statements (this is how the older C programming languages works):

body := { variabledef }          // Note: forces all the variable definitions to come first
{ statement }

New: In the last class we split a variable definition into two things: a variable declaration (with an optional definition) and a variable assignment. We will leave the existing variable definition rule, but make the definition optional. For variable assignment, we will add a new kind of statement below.

A variable definition associates the value of an expression with a variable. For now we will leave expression undefined -- informally, it is a mathematical formula of some sort. It could be an simple as a number or a variable, or a large complex computation. Expressions also include function application (or function call).

variabledef := type name [ = expression ] ;

Statements

The kinds of statements we've seen incldue the return statement and the if/else statement. I'll use "body" inside the if statement, to indicate that any series of statements is allowed there. Notice the square brackets around the "else" part, indicating that it is optional.

statement := | returnstmt
| ifstmt
| assignment
returnstmt := return expression;
ifstmt := if (expression) {
    body
} [ else {
    body
} ]
assignment := name = expression;

Examples

Here are some code fragments showing legal and illegal code:

 
// Define the function myfunc
//   OK: a program is a series of function definitions
int myfunc(int val)
{
  // Call (apply) a function called otherfunc
  //   OK: the body may include variable definitions
  int x = otherfunc(val);
 
  // Define a function called otherfunc
  //   NOT OK: the body may not contain function definitions
  int otherfunc(int val) {
    return val + 10;
  }
}
// Define a function called otherfunc
//   OK: this is the right place for a function definition
int otherfunc(int val) {
  return val + 10;
}

 
// Is less than 10? yes or no
string isless10(int x)
{
  // Return string "yes" if x < 10, "no" otherwise
  //   OK: if statement may contain a return statement in its body
  if (x < 10) {
    return "yes";
  }
 
  // Another way?
  //   NOT OK: return expects an expression, not a statement
  return if (x < 10) "yes";
 
  return "no";
}

Back to Comp11.