A pointer is a special type whose value is the address of some other data item (a variable or struct or object). Every data item in memory resides at some particular address, and we can use the ampersand operator to inquire about that address. We can then use the star operator to follow the address to the data:
Given any type T:
T v;declares a variable of type T (a regular declaration, like int i;)T* pv;declares a variable of type pointer-to-T pv = &v;gets the address of v and stores it in pv. We say that pv points to v. *pvfollows the pointer and retrieves a value of type T from that address
Here are some very basic examples just using ints:
int main() { int x = 10; int y = 20; int * p; p = &x; // p holds address of x (*p) = 15; // go to the address, store 15 there -- same effect as saying x = 15 putInt(x); // prints 15 p = &y; // now p holds address of y putInt(*p); // prints 20 -- same as saying putInt(y) }
Notice that we can create different conceptual "shapes" with pointers. Compare these two code fragments -- one creates two pointers to two different ints, the other creates two pointers to the same int:
int main() { int x = 10; int y = 20; int * p1; int * p2; // -- Two pointers to two different ints p1 = &x; p2 = &y; (*p1) = 7; // Modify value of x putInt(*p2); // Prints 20 (value of y) // -- Two pointers to the same int p1 = &x; p2 = &x; (*p1) = 7; // Modify value of x putInt(*p2); // Prints 7 (value of x) }
Pointers are commonly used to create referential relationships between structs. In class we discussed the example of encoding information about sequels in the movie struct:
struct Movie { string name; int year; int gross; string director; string cast[5]; Movie * sequelTo; // Pointer to the original movie };
We can then declare separate structs and connect them with pointers:
int main() { Movie starwars; Movie empire; // -- Represent the fact that empire is sequel to starwars (the original) // Store address of starwars in sequelTo field of empire empire.sequelTo = &starwars; // -- Print the title of the previous movie (original) Movie * ptr_to_original = empire.sequelTo; // Gives us a pointer to starwars putString( (*ptr_to_original).title ); // Get the title (presumably, "Star Wars") }
The "arrow" syntax is both more concise and generally produces code that is easier to read. Arrow is equivalent to star followed by dot.
Movie * ptr_to_original = empire.sequelTo; // -- Print name of previous movie (not too bad): putString( (*ptr_to_original).title ); // -- With arrow (more concise): putString( ptr_to_original->title ); // -- Go back two movies (now its getting ugly): PutString( (*(*ptr_to_original).sequelTo).title ); // -- With arrow (much better): PutString( ptr_to_original->sequelTo->title );
Pointers solve an important problem when working with structs (and classes): how to allow functions to modify the contents of a struct instance or object without having to copy all the data back and forth.
Remember that C++ is a pass-by-value language: everything you pass
into a function is copied. For small data items, like a single char or int, you
can simply copy values in, compute a new value, and return it. For larger data
items, like struct instances, this copying can be cumbersome and
inefficient. Consider the following example -- not only does the function
changeName fail to do what we want, it makes an expensive copy of
the Movie object:
int main() { Movie sw; sw.title = "Star Wars"; changeName(sw); // Call changeName on a *copy* of sw // the original sw is unchanged putString(sw.title); // Prints "Star Wars" } void changeName(Movie m) { m.title = "Empire Strikes Back"; // Works on a copy, so the effects are // not visible to main }
The key idea in pass-by-reference is that I can pass a pointer to the object, giving the called function access to the exact object, not a copy. Here is the same code, modified to use pass-by-reference -- now it works as expected. Also notice that I am still copying the argument. The difference is that what I'm copying is just the address.
int main() { Movie sw; sw.title = "Star Wars"; changeName(&sw); // Call changeName giving the address of sw putString(sw.title); // Prints "Empire Strikes Back" } void changeName(Movie * pm) { // -- pm holds the address of a Movie -- in this case, the address of "sw" in main. // In effect, this allows changeName to reach back into the data in main // and change it. pm->title = "Empire Strikes Back"; // in effect, sets sw.title in main }
One of the most useful applications of pointers is to create indexes: alternative orderings of the same set of structs. For example, we can read in all of our movie structs, and then have separate indices ordered by name or year or gross receipts. The underlying data structure is an array of pointers to structs:
int main() { // -- Three movies Movie m1; Movie m2; Movie m3; Movie * byName[3]; // Pointers to movies ordered by name // -- Assuming m2, m1, m3 is the right order: byName[0] = &m2; byName[1] = &m1; byName[2] = &m3; Movie * byYear[3]; // Pointers to movies ordered by year // -- Assuming m3, m2, m1 is the right order byName[0] = &m3; byName[1] = &m2; byName[2] = &m1; }
We can then search different indices depending on what our user is trying to do. With pointers we can have different orderings without having to move around or duplicate the underlying movie data.
Pointers also play an important role in allowing us write programs that can handle any amount of data. Ordinary local variables (data declared inside a function) have two major limitations: (1) the number of variables in a function is fixed at the time you write the program, and (2) variables only live as long as the function.
Dynamic memory allocation allows us to ask for new storage (as much as we want) at runtime. These new "variables" don't have names because they didn't exist when we wrote the program. Instead, we refer to them only by their addresses. In addition, their lifetimes are not bound by any function, so we must explicitly say when we are done with them. Compare these two functions:
void foo() { Movie m1; // A movie struct Movie m2; // Another movie struct // Do something // ... // End of foo -- m1 and m2 go away } void bar() { Movie * pm1; // Just a pointer -- no movies yet Movie * pm2; pm1 = new Movie; // Create a movie object somewhere, give me the address pm2 = new Movie; // Do it again // Do something // .. // At this point pm1 and pm2 would go away, but not the movies they point to delete pm1; // Free the movie pointed to by pm1 delete pm2; // Same for pm2 }
Without dynamic memory allocation there is no way to make more movies on the fly:
int main() { // -- Local variables: 99 movies Movie m1; Movie m2; // (imagine 96 more of these...) Movie m99; // BUT, that is the limit: 99 movies // -- Dynamic allocation: as many as I want... cout << "How many would you like?" int N; cin >> N; for (int i = 0; i<N; i++) { Movie * pm = new Movie; // Gives me a new address every time through loop } // At end of the loop, N distinct Movie objects exist in memory // (Note: I did not save the addresses, but that is a separate problem) // -- What about this? // Does the loop N times, but only one movie exists at a time for (int i = 0; i<N; i++) { Movie m; // Essentially, the same single movie each time through loop } }
In general, there are three kinds of memory in a program:
Automatic memory: local variables declared inside a function. You get a fresh set every time the function is called, and they go away automatically at the end of the function. You can think of the memory as residing inside the function itself. Each one has a name (that you provide) and as a result, the total amount of storage is predetermined at the time you write the program.
Dynamic memory: variables/objects/structs allocated using
new. You get a new one each time the program executes thenewoperation (at runtime). The data resides in a separate part of memory called the heap. Objects only go away when you provide their addresses to thedelete. The quantity is not predetermined, so these things do not have names -- we handle them by their addresses only.Static memory: global variables. Used very, very sparingly, since they tend to break modularity and reuse, and make programs hard to understand. Like automatic memory, each variable has a name, but they live for the whole run of the program.
Back to Comp15.