Introduction
Programming by design using C++
Data Description
- Type definitions
- New kinds of data
Data Examples
Signatures, Purpose Statements, and Headers
- Splitting your project into files
Templates
Coding
The coding toolchain
- Where’s my GUI? Or how to run code
- What about interactions?
Testing
Review and refactoring
How to design cin/cout programs
Advanced C++ Fu
Practice problems

Introduction

Congratulations! You’ve finished COMP 50 and you’re ready to move on. This handout suggests a forward pathway: into and through COMP 15, via C++. On this path you’ll encounter new sets of ideas:

Cost modeling. Computer scientists care what things cost. The big questions are “how long does it take to run?” and “how much space does it need.”
Array-based programming. Arrays provide a new way of organizing data that is different from the structures, lists, and trees you are used to. Arrays are everywhere—even in How to Design Programs they make an appearance as “vectors”—and as you’ll learn in COMP 15, they provide a nice complement to lists: a different way of putting “things in a row,” with a different cost model.
Mutable data. In Beginning Student Language and Intermediate Student Language, you never change a structure; you just make a new structure. In the jargon of the field, structures are immutable. Immutability is great for beginners; in particular, immutability makes it easy to write purpose statements. But in many situations, “make a new structure” can be an expensive operation, and if you think about it carefully, you may instead be able to “change (aka mutate) an existing structure.” Cost models are different for different programming languages, but in C++ and C, mutation is especially cheap, and creating new structures is especially expensive.
C++ and its new syntax, concepts, and quirks. Every language designer sets out, at least at the beginning, to serve a particular target audience. The audience for BSL and ISL is first-semester students; they need very small languages that provide minimal distraction—that’s what makes BSL easy to learn. The audience for C++ is professional engineers, but not just any professional engineers. The true audience for C++ is professional engineers who are building large systems and who are obsessed with cost control. Keep this audience in mind, because it will explain many properties of C++ that you might otherwise find surprising.

Here’s a metaphor: programming ISL is like driving a car; you use the pedals and the steering wheel to get to your destination. To program C++, you must not only be able to drive the car; you must also be able to perform all the manufacturer’s recommended maintenance procedures (oil, belts, hoses, engine timing, fuel filter, brakes, transmission fluid, fuses, and much much more).

This handout explores some of things it will help you to know as you migrate from ISL to C++. The handout is organized according to the design recipe.

Programming by design using C++

Jay McCarthy at BYU has put together a course that is based on How to Design Programs but that uses C++. He has put together a nice guide for his students at http://faculty.cs.byu.edu/~jay/courses/2011/fall/142/course/. Jay’s guide contains information about design using C++ that you will find helpful.

Data Description

Type definitions

You’re familiar with these kinds of data:

Named data, including named atomic data
Data defined by parts
Data defined by choices
Data definitions in which names may be self-referential or mutually referential

All of these ideas are present in C++, but there is a great deal more: C++ includes a (somewhat inexpressive) language for data description called a type system. In ISL, the foundations of your data descriptions are named data and define-struct. In C++, all of your data descriptions have to be built on types. A key property of these types is that you have to get the compiler to swallow them. Until the compiler understands your type definitions, it won’t look at your code.

Atomic data is different

In ISL we had numbers, booleans, characters, strings, symbols, and images. Numbers could be either exact or inexact. C++ has a different set of atomic data:

Instead of two kinds of numbers, there are a zillion. The important ones are

int Machine integer

double Inexact number, aka “floating-point” number

There are no fractions (aka exact rational numbers) and no exact decimal numbers—everything that’s not an integer is inexact.

Integers are limited in size. The int type can represent integers in the range of about plus or minus two billion.

In exchange for giving up all the laws of arithmetic, the C++ engineer knows exactly how much space each number takes up and exactly how much time it takes to add, subtract, and multiply them.
char is a character, but it is limited to 8-bit ASCII characters. In C++, things like á and ö are not characters.
There are two kinds of strings. You can write "hello, world", but I don’t know what it means. You will have to find an expert who can explain it to you.
There are no symbols; instead you can define enumeration literals (described below).
There are no images.
There are “IO streams”, which are used by C++ programs to communicate with the outside world.

Writing definitions by parts

C++ is quite good at definition by parts. You have two tools available, called struct and class. The struct is most like the structures you know. Unless and until your instructor wants to do things that require class, I recommend you begin with struct. Here are some struct definitions for a Fitts’s Law problem:¹

struct Position { // (x, y) screen coordinates,
                  // upper-left origin
  int x; // distance in pixels from left edge
  int y; // distance in pixels from top edge
};

struct WaitingState { // waiting for 
                       // 2nd mouse click
  Clock clock;     // time since launch
  Position star;   // location of star's center
  Position circle; // location of circle's center
  double radius;   // circle's radius in pixels
};

In these examples, I have combined the necessary C++-isms with comments (marked in //) that turn the C++ type definition into a true data description. The data-description part says what the whole structure means and what each part stands for. The C++ type-definition part gives the name of the structure and the name and type of each part.

To make a structure in C++, you have to “declare” it (or a pointer to it) and the initialize each field by mutation. To look at the parts of a structure, you use either dot notation (for structures) or arrow notation (for pointers to structures). You can see more about this in the section on Templates.

Writing definitions by choices

C++, like most other languages that enforce type definitions, has crappy support for definition by choices. There is no support for arbitrary choices. To create a definition by choices, you have to split it into two steps:

Define an enumeration that contains one name for each choice. Such a name is an enumeration literal.
Define a structure that contains a union of the choices, together with a tag (the enumeration literal) that identifies which choice was made.

Here’s an example:

// a world-state is tagged with one of these:
typedef enum {
  INITIAL, // state after startup
  WAITING, // state after one click
  SUCCESS, // experiment ended successfully
  EXPIRED  // experiment failed
} WorldStateTag;

// a world-state is one of these choices:
structure WorldState {
  WorldStateTag tag; // which choice obtains
  union {
    InitialState initial;
    WaitingState waiting;
    SuccessState success;
    ExpiredState expired;
  };
};

I’m getting ahead of the story, but here’s a template for a pointer to a world state:

switch(state->tag) {
  case INITIAL:
    ... state->initial.clock ... ;
    break;

  case WAITING:
    ... state->waiting.clock ... 
    ... state->waiting.star ...
    ... state->waiting.circle ...
    ... state->waiting.radius ...
    break;

  ... cases for other two choices ...

  default:
    assert(false);
}

Self-referential and mutually referential definitions

If you want to make a self-referential definition, you’ll need to use a pointer, but otherwise the name is there:

// DATA: a list-of-numbers is
// either a pointer to a NumberCons
// structure or is the NULL pointer.

struct NumberCons {
  int first;
  NumberCons *rest; // list of numbers
};

typedef NumberCons *ListOfNumbers;

If you want mutual reference, you need to begin with “incomplete”: structure or class definitions:²

#include <string>
class HBoundary;
class VBoundary;
class Tree2D;

class HBoundary { // split along y
public:
  Tree2D *above; // points above y line
  double y;
  Tree2D *below; // points below y line
};

class VBoundary { // split along x
public:
  Tree2D *left; // points left of x line
  double x;
  Tree2D *right; // points right of x line
};

typedef enum {
  An_HBoundary, A_VBoundary, A_Point
} TreeTag;

class Mappoint {
  double x, y; // position
  std::string name;  // what's at the point
};

/*
DATA DEFINITION:
  A Tree2D is one of:
    - A Mappoint
    - An HBoundary
    - A VBoundary
  as identified by the 'choice' field
*/

class Tree2D {
public:
  TreeTag choice;
  union {
    HBoundary hboundary;
    VBoundary vboundary;
    Mappoint  *point;
  };
};

New kinds of data

Mutable data

In ISL, once you make a structure, the contents of that structure are fixed for all time. If you want a structure with different fields, you make a new structure.

In ISL, once a variable is defined, its value doesn’t change. Why, then, is it called a “variable”? In many programming languages, including C++, a defined name actually refers to a mutable location, which you can think of as “a box containing a value”. You can change the value by assigning to the variable.
Thus, the value varies.

In C++, mutation is accomplished by the assignment operator, which is written = and pronounced “gets”.³
Assignment makes most sense when used with C++ loops and conditionals, of where there are examples below. but here are a few examples:

int n; // n has no value
n = 7; // now n is seven
n = n + 3; // now n is 10

We threw in that last example to make the pronunciation of the assignment crystal clear: the only pronunciation that makes sense is “n gets n plus three.” If you say “n equals n plus three”, you’re not thinking clearly.

Arrays

You have plenty of practice programming with sequences of values; you use lists. Lists get used a lot in COMP 11 and COMP 15 as well (where they are called “linked lists”), but there’s more. The problem with a list is that if you want the seventh element, you have to call rest six times. The cost of all those calls to rest adds up, and the C++ engineer, who is obsessed with controlling costs, might instead choose to use an array.

The advantage of an array is that every element is available for the same cost. And the cost is quite small—typically only a few machine instructions.
The disadvantage of an array is that you can’t easily add or remove elements.⁴
Array elements aren’t accessed by using selectors like first and rest. Instead you use square brackets with the number or index of the array element.

In C and C++, indices begin at zero. Think of the index as the number of elements you have to skip before you find the one you are interested in. This “zero-indexed” model makes it very easy to write generative recursion over arrays and parts of arrays.
If you misuse first and rest, DrRacket will stop you every time. If you misuse array indices, C++ won’t say a word—it will just silently give you wrong answers, or perhaps after a few thousand or few million instructions, it will crash. A tool called valgrind can sometimes tell if you’ve misused an array index.

In C++ arrays, as in C++ lists, every element has to have the same type. This type could be defined by choices, but it need not.

If the size of the array is known at compile time, it is called a static array:

const int SIZE = 4;
string animals[SIZE];
animals[0] = "giraffe";
animals[1] = "squirrel";
animals[2] = "reindeer";
animals[3] = "squirrel";

A static array has a fixed size and holds a single type of element. Both the size and the element type are declared when the array is made. To be known at compile time, the size must be either a named constant (e.g., const int SIZE = 4) or a literal integer (e.g., 12). The example array has room for four elements, and any element can be reached at any time for constant cost. This array is initialized by *mutation*; each of the four elements is assigned to using the "gets" (=) operator. Woe betide the programmer who writesanimals[4]`; there’s nothing in that slot but a dead squirrel.

Pointers to structures

In BSL and ISL it looks like cons cells contain other cons cells, tree nodes contain other tree nodes, and so on. But that’s not the way it really works. Each cons cell or a tree node has a known, fixed size, and what it contains is a pointer to another cons cell or another tree node.

In keeping with the C++ theme of total, obsessive control, in C++ you manage the pointers. A pointer is really the address of a memory location; it “points to” the data at that address. To get the data you want, you have to “follow” or “dereference” the pointer. Making sure that the memory really does contain good data is your job—and that’s why many programmers are a little bit afraid of pointers.

A pointer shows up in a type definition (data description) anywhere you see the star (*) character. Here’s an example from above:

class HBoundary {
public:
  Tree2D *above;
  double y;
  Tree2D *below;
};

The fields above and below are pointers.

The most typical way to use a pointer is the same way you would use make-h-boundary in ISL. To do this, the first step is to define the pointer to point to fresh, clean memory that doesn’t have anything in it:

HBoundary *hbp = new HBoundary;

the next step is to initialize every field of the structure that is pointed to:

hbp->above = NULL;
hbp->y     = 3.7;
hbp->below = NULL;

Because hbp is a pointer to a structure, not a structure itself, we refer to the fields using the arrow notation. And we initialize each field using the assignment (“gets”) operator.

Usually it’s a good plan to put this kind code into a function, so you can get back to the world of making structures. I’m going to make a pointer to a full Tree2D, which itself contains an HBoundary.

Tree2D *make_h_boundary(Tree2D *t1,
                        double hline, 
                        Tree2D *t2) 
{
  Tree2D *t = new Tree2D;
  t->choice = An_HBoundary;
  t->hboundary.above = t1;
  t->hboundary.y     = hline;
  t->hboundary.below = t2;
  return t;
}

The first line initializes the pointer t to point to fresh memory. The assignment to t->choice initializes the tag. The last three assignments use dot notation to initialize each of the parts of the hboundary that is one of the choices for t.

The sooner you define these kinds of functions, the easier it will be for you to write your other codes and your test cases, and the sooner you will be back to programming the way you know how. C++ provides special support for constructor functions, which you can read about below.

More pointers

There’s far more to the pointer story then we’ve told here. Pointers are the Swiss Army knife of C++. They can be used to mess with other people’s variables, and to point to atomic data. And they are the real technology behind C++ arrays, especially dynamic arrays. You can see more in the section at the end of this document on Advanced C++ Fu.

Data Examples

C++ has no built-in support for data examples. And it also places severe restrictions on what values you can give to global variables. So if you want to write data examples, the only real alternative is to put them in functions:

Tree2D *tree_example_1(void) {
  return make_tree_mappoint(-71.05, 42.36, "Boston");
}
Tree2D *tree_example_2(void) {
  return make_tree_mappoint(-71.42, 41.82, "Providence");
}
Tree2D *tree_example_3(void) {
  return make_h_boundary(tree_example_2(),
                         41.0,
                         tree_example_1());
}

Signatures, Purpose Statements, and Headers

In C++, the signature of a function, like a data definition, is part of the language, and the C++ compiler has to approve it. In C++, the signature and header are combined; the resulting combination is sometimes called the function’s signature and sometimes is called the function’s prototype.

Just is in BSL and ISL, the signature or prototype gives the name of the function, says what kind of data is expected for each parameter and what kind of data is returned. It also gives the name of each parameter (which in BSL is done by the header). Because the name should be the very first thing we see, the purpose statement is usually written after the prototype.

Here are some examples:

Mappoint *nearest_point(Tree2D *tree,
                        double x,
                        double y);
// returns the point in the given tree
// that is closest to the given (x,y)

int trigram_count(Model *m,
                  char c1,
                  char c2,
                  char c3);
// returns the number of times the 
// trigram c1-c2-c3 occurs in the
// given model

int observation_count(Model *m);
// returns the total number of observations
// used to build the model

Splitting your project into files

DrRacket works with one file at a time, and so we set things up so that each of your projects fit into a single file. But when projects get big, like the trigrams project, it is actually helpful to build and test files separately. In C++ this is very easy:

Your data definitions, function prototypes, and purpose statements normally go into a separate file called the header file. If you implement a binary search tree, you might call the header file something like bst.h.
Your data examples, function definitions, and unit tests go into another file, which might be called something like bst.cpp.

Templates

Template for atomic data

Just as in BSL and ISL, what you do with atomic data is either pass it to your own functions or use built-in functions. But in C++, some functions are called operators and are written in infix notation. An infix operator is a function that takes two arguments, and a call to the function is written with the operator between the arguments. Examples:

n == 0
x + y
n + 1
(x - 1) * (x + 1)
n % 4
cout << "Hello, world"

None of this stuff is essential to systematic problem-solving, but it is all stuff that you have to memorize or look up in order to get anything done.

Here are a couple of traps and pitfalls:

The equality operator is written with two equals signs (==), and it doesn’t always mean what you would hope it means.

Despite being written with two equals signs, the == is still pronounced “equals.” (There is another operator which is written with a single equals sign and is pronounced “gets.”) If you say “double equals,” perhaps you are a troglodyte, or perhaps you are merely keeping bad company.
When used on whole numbers (integers), operators / and % mean quotient and remainder. When used on floating-point numbers (double), / means floating-point (inexact) division. C++ does not have any operator that corresponds to ISL’s exact division (which is also written with the slash).
Operators have precedence and associativity, which allows you to combine multiple operators without parentheses. Be cautious about using this. Norman recommends following only a few established idioms, and being sure you know what they mean. With mixtures of different operators, parentheses are never a bad thing.

Templates for structures & classes (defined by parts)

The template for a C++ structure or class is the same as in BSL, but it is written using dot notation.

In BSL we have

;; DATA DEFINITION
;; a time is a structure:
;;   (make-time HH MM SS)
;; where HH, MM, and SS 
;; are whole numbers such that
;;   0 <= HH < 24
;;   0 <= MM < 60
;;   0 <= SS < 60
(define-struct time (hours minutes seconds))

;; DATA EXAMPLES
(make-time  9 15 0)
(make-time 13 30 0)

In C++ we have

struct Time {
  int hours;   // 0 <= hours < 24
  int minutes; // 0 <= minutes < 60
  int seconds; // 0 <= seconds < 60
};

Time time_example_1(void) {
  Time t1;
  t1.hours = 9;
  t1.minutes = 15;
  t1.seconds = 0;
  return t1;
}

Time time_example_2(void) {
  Time t2;
  t2.hours = 13;
  t2.minutes = 30;
  t2.seconds = 0;
  return t2; // don't warn
}

In BSL the template for a function consuming a time is

;; time-template : time -> ?
(define (time-template t)
   (... (time-hours t) (time-minutes t) 
        (time-seconds t)))

In C++ the template for a function consuming time is

? time_template(Time t) {
  ... t.hours ... t.minutes ... t.seconds
}

In C++ we often work with pointers to structures. Here is a standard constructor function for Time:

Time *make_time(int hh, int mm, int ss) {
  // In BSL this function is built in;
  // in C++ you have total control and
  // you get to write it...
  Time *t = new Time;
  t->hours = hh;
  t->minutes = mm;
  t->seconds = ss;
  return t;
}

Time *time_ptr_example_1(void) {
  return make_time( 9, 15, 0);
}

Time *time_ptr_example_2(void) {
  return make_time(13, 30, 0);
}

The template for a function consuming a pointer to time is

? time_template(Time *t) {
  ... t->hours ... t->minutes ... t->seconds
}

Defining and using constructor functions

If you use a C++ class instead of struct, you can build the make idea in as a C++ constructor. The constructor allows you to pass parameters to new. This idiom is about as close as you can get to BSL:

class Time {
public:
  int hours;   // 0 <= hours < 24
  int minutes; // 0 <= minutes < 60
  int seconds; // 0 <= seconds < 60
  Time(int hh, int mm, int ss);
};

Time::Time(int hh, int mm, int ss) {
  this->hours = hh;
  this->minutes = mm;
  this->seconds = ss;
}

Time *time_ptr_example_1(void) {
  return new Time(9, 15, 0);
}

Time *time_ptr_example_2(void) {
  return new Time(13, 30, 0);
}

Given that we’re doing all this mutation anyway, we should take advantage of the situation to enforce more of the data description from within the constructor function:

Time::Time(int hh, int mm, int ss) {
  assert(0 <= hh && hh < 24);
  assert(0 <= mm && mm < 60);
  assert(0 <= ss && ss < 60);
  this->hours = hh;
  this->minutes = mm;
  this->seconds = ss;
}

Defining constructor functions for a definition by choices

COULD SOME KIND PERSON WRITE AN EXAMPLE HERE? I’M LOOKING FOR BEST PRACTICES SO THAT IF I HAVE A CLASS THAT IS DEFINED BY CHOICES, I CAN DEFINE ONE FUNCTION FOR EACH CHOICE (IT CAN’T ALWAYS BE A CONSTRUCTOR). CALLING ANY OF THE FUNCTIONS PRODUCES AN OBJECT OF THE CLASS.

Templates that use conditionals

In C++, you get only one condition at a time, and you typically use the statement form with if and else.

Here’s BSL:

;; climate : number -> string
;; return expected climate of given latitude
(define (climate latitude)
  (cond
    [(< (abs latitude) 23.5) 'tropical]
    [(< (abs latitude) 35)   'subtropical]
    [(< (abs latitude) 50)   'temperate]
    [(< (abs latitude) 70)   'subpolar]
    [(<= (abs latitude) 90)   'polar]))

And here’s C++

#include <cmath>
#include <cassert>

typedef enum {
  TROPICAL, SUBTROPICAL, TEMPERATE,
  SUBPOLAR, POLAR
} Climate;

Climate climate(double latitude) {
  if (fabs(latitude) < 23.5) {
    return TROPICAL;
  } else if (fabs(latitude) < 35) {
    return SUBTROPICAL;
  } else if (fabs(latitude) < 50) {
    return TEMPERATE;
  } else if (fabs(latitude) < 70) {
    return SUBPOLAR;
  } else if (fabs(latitude) < 90) {
    return POLAR;
  } else {
    assert(false);
  }
}

Yes, absolute value is spelled differently.

Enumeration literals

To choose among enumeration literals, or among any set of known integers, use switch:

bool has_giant_insects(Climate c) {
  switch (c) {
    case TROPICAL:    return true;
    case SUBTROPICAL: return true;
    case TEMPERATE:   return false;
    case SUBPOLAR:    return false;
    case POLAR:       return false;
    default:          assert(false);
  }
}

Definition by choices

The template for a definition by choices is to switch on the tag and then choose the corresponding member from the union. There’s an example in the section on data definition by choices; here’s just the template, repeated:

switch(state->tag) {
  case INITIAL:
    ... state->initial.clock ... ;
    break;

  case WAITING:
    ... state->waiting.clock ... 
    ... state->waiting.star ...
    ... state->waiting.circle ...
    ... state->waiting.radius ...
    break;

  ... cases for other two choices ...

  default:
    assert(false);
}

Natural recursion and mutual recursion

Natural recursion and mutual recursion work almost exactly as in BSL. The only difference is that if you write mutually recursive functions, you have to declare the functions, with prototypes, before you define them.

#include <cassert>
#include <cstdlib>

class Atom;
class Sx;
class SxCons;

// an SxList* is either a pointer
// to an SxCons or is NULL;
typedef SxCons SxList;

typedef enum { NUMBER, CSTRING, BOOL } 
AtomTag;

class Atom {
  public:
  AtomTag tag;
  union {
    double number;
    char *cstring;
    bool   b;
  };
};

typedef enum { ATOM, SXLIST } SxTag;
class Sx {
public:
  SxTag tag;
  union {
    Atom *atom;
    SxList *list;
  };
};

class SxCons {
public:
  Sx *first;
  SxList *rest;
};

bool has_number_atom(Atom *a);
bool has_number_sx(Sx *sx);
bool has_number_sxlist(SxList *sxs);

bool has_number_atom(Atom *a) {
  return a->tag == NUMBER;
}

bool has_number_sx(Sx *sx) {
  switch (sx->tag) {
    case ATOM:
      return has_number_atom(sx->atom);
    case SXLIST:
      return has_number_sxlist(sx->list);
    default:
      assert(false);
  }
}

bool has_number_sxlist(SxList *sxs) {
  if (sxs == NULL) { // empty list
    return false;
  } else {
    return has_number_sx(sxs->first) ||
           has_number_sxlist(sxs->rest);
  }
}

Arrays

Lists

Coding

Organizing projects into files

Alternatives to standard abstract list functions

Here’s an ISL function:

;; has-even? : (listof number) -> boolean
;; tell if the given list contains an even number
(define (has-even? ns)
  (ormap (lambda (n) (= (remainder n 2) 0)) ns))

In C++, you’d use a for loop instead of the abstract list function, and you’d use return to make an early exit from the loop:

bool has_even(NumberList ns) {
  // tell if the given list of numbers contains an even number

  for (NumberList p = ns; p != NULL; p = p-> rest) {
    if (p->first % 2 == 0) {
      return true; // causes
    }
  }
  return false;
}

You can also use for loops for code you would write using andmap or foldl. To replace code written using map, filter, or foldr, you would use natural recursion.

CAN SOMEONE BUILD OUT SOME EXAMPLES SO WE SHOW THE ABSTRACT LIST FUNCTION IN ISL AND THEN THE CORRESPONDING LOOP IN C++?

Alternatives to recursion

LOOPS!

LOOPS ALWAYS INVOLVE MUTATION, MUTABLE STATE, OR POSSIBLY I/O.

THERE ARE SOME INTERESTING PARALLELS TO RECURSIVE FUNCTIONS.

ANALOG OF CERTAIN VERY SIMPLE STRUCTURAL RECURSIONS ON LISTS:

for ( NumberList *ns = stuff
    ; ns != NULL
    ; ns = ns->next
    ) 
{
   ... ns->first ...
}

THIS TEMPLATE IS RELATED TO THE ABSTRACT LIST FUNCTION foldl.

THERE ARE ALSO ANALOGS OF FOLD FOR ARRAYS. LETS SUPPOSE THAT THERE ARE N ELEMENTS IN AN ARRAY. THEN

for (int i = 0; i < n; i++) {
  // loop analogous to foldl
  // (very common in C++)
  ... a[i] ...;
}

for (int i = n; i-- > 0; ) {
  // loop analogous to foldr
  // (very rare in C++)
  ... a[i] ...;
}

IN THE GENERAL CASE, A LOOP ACTS LIKE A GENERATIVE RECURSIVE FUNCTION. EVERY LOOP HAS A “LOOP INVARIANT” WHICH ACTS LIKE THE PURPOSE STATEMENT OF THE RECURSIVE FUNCTION. AND EVERY LOOP MUST HAVE A TERMINATION ARGUMENT, HAVING TO DO WITH THE SIZE OF THE PROBLEM SOMEHOW GETTING SMALLER.

The coding toolchain

Where’s my GUI? Or how to run code

DrRacket is an example of an integrated development environment, or IDE. The environment is “integrated” because everything you need is integrated into DrRacket. In other CS courses, as in the professional world, programmers often use an environment that is made up of different components. Here the components we recommend you use in COMP 15:

To create and modify the text of your program, you’ll need an editor. The editor is the best available analog of DrRacket’s Definitions window, and a good one will do at least some of the coloring and highlighting that you’ve been accustomed to from DrRacket.

Norman recommends Vim. If you’ve already learned Emacs, that’s also an acceptable choice. Sublime Text is highly thought of, as is TextMate for the Mac.

Don’t learn gedit. If you’re already using gedit, learn something better. If you don’t already use Emacs, don’t start; friends don’t let friends learn Emacs.

The Computer Science Exchange will probably be willing to sponsor something about learning editors. And if you like retro games, there’s Vim Adventures.
To turn the text of your program into something you can run, you’ll need a compiler. When you compile a program, it is translated directly to the machine instructions of the target machine. You can compiler your program once and run it over and over. But if you touch the source code, you’ll have to compile again.

What compiler should you use? Norman strongly recommends clang++. There is another compiler on the system called g++, but if anything goes wrong, clang++ does a much better job telling you what happened. Confirm with your COMP 15 instructor that clang++ is OK.
Just as when using DrRacket you should press Run very often, when using a compiler you should compile very often. But compiling is more complicated than pressing Run. Part of the complexity arises because in COMP 15, you’ll be building programs from multiple parts. In ISL, if you require a part, DrRacket just compiles it for you. But in C++, if you #include a part, the compiler is not going to compile it for you—the compiler thinks you’re a professional engineer and that you want total control over what things are compiled and when. To control compilation and to build programs from multiple parts, you’ll need a compile script.

Here is a short example scripts: if you have a binary search tree in bst.cpp and a trigrams project in trigrams.cpp, you might write the following super-simple compile script:
```
#!/bin/sh

set -e    # halt on first error

# these flags max out warnings and debug info
FLAGS="-g -O -Wall -Wextra -Werror
                -Wfatal-errors -pedantic"

rm -f *.o  # make sure no object files
           # are left hanging around

clang++ $FLAGS -I. -c bst.cpp
clang++ $FLAGS -I. -c trigrams.cpp
clang++ $FLAGS -o trigrams bst.o trigrams.o -lm
```
Here is a somewhat more sophisticated compile script for the same project:
```
#!/bin/sh

set -e    # halt on first error

# use 'clang++ as the compiler (or try 'g++')
CC=clang++

CFLAGS="-I."  # find my own *.h files
LIBS="-lm"    # might add more libraries
              # for some projects

# these flags max out warnings and debug info
FLAGS="-g -O -Wall -Wextra -Werror
                -Wfatal-errors -pedantic"

rm -f *.o  # make sure no object files
           # are left hanging around

# compile each file in the project
for cppfile in bst.cpp trigrams.cpp
do
  $CC $FLAGS $CFLAGS -c $cfile
done

# link together .o files + libraries 
# to make an executable binary
$CC $FLAGS -o trigrams bst.o trigrams.o $LIBS
```
Once you have your compile script, to build your project, you need a Terminal window (or your editor) to run the command ./compile or sh compile. Do this insanely often. Nothing keeps trouble away like compiling insanely often.
As you build more complex projects, you’ll want to learn how to write your own compile scripts. Norman has put together an entire handout on compile scripts, but it is written for COMP 40 students who are using C. The rules for C and C++ are so similar, however, that this handout may still be useful. In particular, the handout includes an all-purpose compile script at the end.
If your compile script finishes successfully, it will produce an executable binary called trigrams. This binary is just another file, but you can run it like a command or a program:
```
./trigrams -train corpora
./trigrams -classify URL
```

What about interactions?

Using DrRacket, once you write any code, you can interact with it. The cost of providing this kind of interaction is considerable, and you have to pay the cost whether you use the interaction or not. Because C++ is all about controlling costs, this kind of interaction is not provided. Instead, if you want any interactions, you code them yourself.

The most basic form of interaction is simply to run a program and see what happens. For more interesting interactions, you use input and output, also called I/O. It is a little bit like the experience of writing S-expressions to files and reading them back again, except that C++ I/O works only with simple atomic data.

To write information, use the standard output stream cout, with the << operator.
To read information, use the standard input stream cin, with the >>. Here’s an example of a program that reads a number and repeats it:

#include <iostream>

using namespace std; 
  // combined with #include, acts
  // a little like `require`

int main(void) {
  cout << "What is your favorite number? ";
  int n; // favorite number
  cin >> n;
  cout << "You said " << n << endl;
  return 0;
}

An expression like cout << "hello" is pronounced “write cout from "hello",” and cin >> n is pronounced “read from cin into n.”

If you put the code above into a file called interact.cpp, then compile it using

g++ -o interact interact.cpp

you can then run it using

./interact

and it will ask you for a number, read it into the variable n, then print it out again. Writing endl to cout means to print a blank line.

At the top of the file, #include<iostream> means “I require the I/O library”, and using namespace std; means “I want to refer to library things by their short names.”

Testing

Testing is where you pay the biggest cost for the C++ programmer’s ability to control everything. I don’t want to complain too much—that total control enables Google, Apple, and Microsoft to create many lovely things—but it does make testing a pain in the ass.

You might be tempted to give up on unit testing entirely. Don’t. Here are some relatively “cheap and cheerful” things you can try:

Use assertions liberally. (You can find examples of assert throughout this document.) One advantage of the “expressions and statements” syntax used by C++, as opposed to the “expressions only” syntax of BSL and ISL, is that it is often relatively easy to toss in an assertion which acts as a very simple test that is run every time a function is run.
Create functions whose only purpose is to run tests. Start with simple tests, then add sophistication. An example of a super-simple test is that the result of an operation is not NULL, or that the result of a tick operation has a minutes field of at most 59:
```
void time_tests_1(void) {
  assert(make_time(12, 30, 00) != NULL);
}

void time_tests_2(void) {
  assert(tick(time_ptr_example_3())->minutes < 60);
}
```
DrRacket measures test coverage, which means simply which code is executed by your tests. As a bonus, it also provides a great visualization. For C++ you will use a separate tool called gcov. Norman recommends looking at high-voted questions tagged gcov at http://stackoverflow.com.
In order to re-create some of the power of check-expect, you will need to be able to test when two values are equal. Thus, part of the design recipe for every structure and class definition, which encompasses both definition by choices and definition by parts, is to redefine the built-in equality operator. Get an expert to show you how to redefined ==.

WOULD SOME KIND PERSON LIKE TO PROVIDE AN EXAMPLE HERE?
In order to re-create some of the power of check-expect, you will need to be able to print values. This goal can be accomplished by defining the overloaded operator <<. It also becomes part of the design recipe for every structure and class definition.

WOULD AN EVEN KINDER PERSON LIKE TO PROVIDE AN EXAMPLE?

DrRacket runs all your tests automatically. To run tests in C++, you will need to write code. This code should go into a separate file, something like bst-tests.cpp or trigrams-tests.cpp, each of which has a main function that runs all the bst or trigrams tests.

Once you have these files, you not only put them in your compile script; you make sure the compile script runs the tests. Here’s a fragment from an example compile script:

clang++ $FLAGS -I. -c bst.cpp
clang++ $FLAGS -I. -c bst-tests.cpp
clang++ $FLAGS -o bst-tests bst-tests.o bst.o -lm
./bst-tests  # run the tests of the bst functions

clang++ $FLAGS -I. -c trigrams.cpp
clang++ $FLAGS -I. -c trigrams-tests.cpp
clang++ $FLAGS -o trigrams-tests \
                  trigrams-tests.o trigrams.o
./trigrams-tests  # run tests of trigram functions

clang++ $FLAGS -o trigrams bst.o trigrams.o -lm
                  # build the trigrams project

Once you have the key pieces in place (equality, printing, testing on every build), you may want to upgrade from assert to the more powerful CppUnit unit-testing framework. This framework has good press and was recommended to Norman by C++ programmers that he trusts. There’s an example in one of the class demos from Fall 2013.

Review and refactoring

How to design `cin`/`cout` programs

You have learned a systematic design process for interactive big-bang programs. You have also written one “batch” program (the trigrams project), and you can read more about the design of batch programs in Section 2.4 of *How to Design Programs (Second Edition).

You would benefit from a systematic design process for designing interactive programs that use cin and cout instead of the big-bang graphics engine. The idea of “world state” can still apply to such programs, but we don’t have any details for you. Yet.

Advanced C++ Fu

Advance topic: classes and methods

NOTE: NR THINKS THAT THE INVISIBLE this PARAMETER IS TOO ADVANCED FOR BEGINNING STUDENTS.

Classes are structures whose members can be declared to be public or private. As with structures, the members may be any kind of data, including functions, but member functions of a class have special names; they are called methods. Public members of a class are available to any code; private members are available only to the methods defined in the class.

Each class has special methods called constructors. You can identify the constructor because it doesn’t have a name; it just has the return type, which is the same as the class name. The constructor is called when a new class object is made, and it says how to initialize the class variables. It’s good to define a constructor method; if you don’t, C++ will supply a method that fills your variable with dead squirrels.

Here’s a sample class definition:

class Oven{
  public:
  Oven(); //default constructor.
  void increaseTemp(int degrees);
  // add the given temperature to the receiver

  void turnOff();
  // turn off the recevier

  int  getTemp();
  // tell the temperature to which 
  // the receiver is currently set

  private:
  int  temperature;
  bool isOn;
};

The implementation is where the functions are actually defined, and is usually placed in files with a .cpp extension. Notice the :: operator in the following sample. This is the scope operator, it means that the function being defined is part of the Oven class. The following is a sample oven.cpp file.

#include "oven.h"

// initializes a new oven to be off.
Oven::Oven() {
  temperature = 0;
  isOn = false;
}

// changes the temperature by degrees.
void Oven::changeTemp(int degrees) {
  temperature += degrees;
}

// turns off oven.
void Oven::turnOff() {
  isOn = false;
}

// gets the temperature.
int getTemp() {
  return temperature;
}

More pointers

The single most profitable investment you can make in understanding pointers is to watch the video Pointer Fun with Binky at Stanford. The video took 100 hours to make and is well worth the several minutes it takes to watch.

Here’s a pointer that doesn’t point to anything.

int *px;

Now we can initialize it:

px = new int;

This code creates a brand new bit of memory and tells the pointer the address of it. Now the pointer has an address, but there isn’t any value there.

*px = 5;

The * operator dereferences the pointer: it goes to the pointer’s address and accesses the value there.

WHY? DO WE HAVE TO GO HERE? There are two ways to assign the pointer to an address. The first is to use an existing variable:

int x = 5;
px = &x;

The & operator gives the address of a variable. The second way is to allocate new space for the pointer:
px = new int;
If pointers remain confusing, often a metaphor can help lead you to a better understanding. If you like, you can think about pointers as pieces of paper with just enough space to write down the address of someone’s house. Of course this pointer itself is not the actual house! To visit the house itself, we must use the address on the piece of paper to help us find the actual house; this is known as dereferencing, and is accomplished using the * operator. We can look at the house referenced by the pointer (*px), or we could put an entirely different house at the same address (*px = 47;). If you have a house in a variable (i.e. a non-pointer variable, call it x), you can get it’s address using the & operator. You can write down this address into a pointer: px = &x;, as above. It is also possible to build a new house and assign its address to a pointer. We do this using the new operator: px = new int;.

Advanced loops

Loops

THIS SECTION MUST BE REVISED. WE DO NOT JUST THROW SYNTAX AND EXAMPLES AT PEOPLE. WE EXPLAIN HOW TO USE LOOPS AS PART OF A SYSTEMATIC PROCESS OF DESIGN AND PROBLEM SOLVING.

In order to do interesting things with arrays, we need a better way to move through them than directly indexing into each element. In Racket, you used recursion to operate on every element in a list. In , we use loops. There are three kinds of loops: for, while, and the somewhat less common do-while. All three have boolean condition. When the condition is not satisfied (does not evaluate to true), the loop ends.

I AM REALLY NOT LIKING THESE EXAMPLES BASED ON I/O BECAUSE OUR STUDENTS HAVE ZERO EXPERIENCE WITH ELEMENT-AT-A-TIME BATCH I/O. MY OPINION IS THAT THIS STUFF NEEDS TO BE IN A SEPARATE SECTION. I’D LIKE TO BEGIN WITH EXAMPLES THAT USE LOOPS COMPUTATIONALLY WITH SIDE EFFECTS, E.G., THE SUM OF ALL VALUES IN AN ARRAY, THE MINIMUM VALUE IN AN ARRAY, THE SAME WITH LINKED LISTS, AND SO ON.

AND JUST AS WE WOULD NEVER SHOW A FUNCTION WITHOUT A PURPOSE STATEMENT, WE SHOULD NEVER SHOW A LOOP WITHOUT AN INVARIANT.

int sum = 0;
for (int i = 0; i < n; i++) {
  // INVARIANT: `sum` contains the
  // total of element 0 through i-1
  // of array `a`

  sum = sum + a[i];
}
// RESULT: sum contains the total
// of all elements of a[]

The first type is a for loop. These are often used with dynamic arrays because it is easy to specify a certain number of iterations. The header of the loop has three parts: a declaration for the iterator, a boolean condition, and a way to change the iterator[^1]. The body of loop, in curly brackets, is the part that is repeated. This for loop will print out all the animals in the list.

  for (int i = 0; i < SIZE; i++) {
    // purpose: to write animals[i] to cout
    cout << animals[i] << endl;
  }

The second kind of loop is called a while loop. They are used when you don’t necessarily know how many iterations you’ll be doing. This kind of loop has a boolean condition in the header, and a body that is repeated. The following program reads in a number from the user and prints out all the numbers between x and 10.

  int x;
  cin >> x;
  while (x < 10) {
    cout << x;
    x++;
  }

The expression x++ operator mutates the value of x by adding one. It is pronounced “x incremented”; the operator is pronounced “increment.” You may also see an expression like ++x; evaluating this expression has a the same effect on x, but its value is different. There is a similar decrement operator –-.

The do-while loop is used when you know you want to perform the body of the loop at least once. In the example below, the boolean condition is at the bottom. The user can type in their favorite animals. When they are done, they use the keyword “none” to stop.

  string animal;
  do {
    cout << "What is your favorite animal? ";
    cin >> animal;

    if (animal == "chicken") {
      cout << "That's my favorite, too!" << endl;
    } else if (animal != "none") {
      cout << "Oh, that's nice..." << endl;
    }
  } while (animal != "none");

Practice problems

Conditionals

Minnie owns a mini golf course. Write a program that determines the price of a round of mini golf given the following rules.

The base price is $9.
If the temperature is over 80 degrees, subtract 10 cents for every degree above 80.
If the temperature is under 65 degrees, subtract 10 cents for every degree below 65.
Add $1 if it is a weekend or the hour is 5pm or later.
Subtract $2 if it is raining.

Loops

Write a for loop that prints out every other element in a list, starting with the zero-th.
Write program that counts how many squirrels are in a list (as in the I/O list).
Write a program that reads strings from the user and puts them in a list until it sees the word “DONE”. This is called a sentinel value; it indicates that we have reached the end of the interesting information in a list.

Arrays

Given an array of integers, mutate the array until it is sorted.

Although the builtin atomic types int and double begin with lower-case letters, lots of C++ programmers like to use capital letters to name the types they define themselves. When in Rome…↩
In this example I’ve used different naming conventions for the enumeration literals and the tag in my definition by choices. What’s constant here is that I have a structure or class with two elements: an enumeration literal and a union.↩
If you say “equals”, you are either badly educated or you are a troglodyte.↩
COMP 11 students learn to work around this problem by creating something called a dynamic array. You will learn the same.↩

`int`	Machine integer
`double`	Inexact number, aka “floating-point” number

Emigrating from COMP 50

COMP 50 Staff

Fall 2013