Mark Sheldon's Whirlwind Tour of C
This is not a comprehensive document on
the C programming language; it is a first look for Tufts students
who are already familiar with C++.
Why are we programming in C? It's not just to boost your resumes! C continues to be the defacto systems programming language (though C++ and maybe Rust are making inroads). The Linux, Mac, and Windows operating system kernels as well as most device drivers are all written in C. It has a syntax many find convenient, yet it provides fairly low-level control over what happens in the machine. It also exposes many details of the computer, forcing you to learn more about how the computer works, which is a central mission of this course.
C was created in the early 1970s as a way to write operating system code (specifically Unix) in a language that, unlike assembly language, could be ported easily to new machines, but that allowed data manipulations at nearly the same low, assembly language level. In this, it has been very successful. For those of you who know C++, C will look very familiar. The basic syntax and control constructs are very much the same. This is not an accident: C++ is an extension to C. Before we get too involved, however, why don't we look at a typical program.
Why are we programming in C? It's not just to boost your resumes! C continues to be the defacto systems programming language (though C++ and maybe Rust are making inroads). The Linux, Mac, and Windows operating system kernels as well as most device drivers are all written in C. It has a syntax many find convenient, yet it provides fairly low-level control over what happens in the machine. It also exposes many details of the computer, forcing you to learn more about how the computer works, which is a central mission of this course.
C was created in the early 1970s as a way to write operating system code (specifically Unix) in a language that, unlike assembly language, could be ported easily to new machines, but that allowed data manipulations at nearly the same low, assembly language level. In this, it has been very successful. For those of you who know C++, C will look very familiar. The basic syntax and control constructs are very much the same. This is not an accident: C++ is an extension to C. Before we get too involved, however, why don't we look at a typical program.
Anatomy of a C Program
This is the traditional first program in C.
The type of an object implies a size, or the amount of storage the object requires (and you can ask for the size of any type using the
Declaring a name with
All names in C must be declared before they are used. A declaration that occurs before the definition of a function or variable (i.e., a prototype which does not allocate space or specify a function body) is called a forward declaration. This is useful for organizing input files with utility functions down at the bottom and for mutually recursive definitions.
Top-level definitions (definitions outside of any function body) described as
Normal non-top-level declarations (inside function bodies) allocate space on the stack, and are called automatic. This space is deallocated on exit from the function or program block in which the declaration occurs. Initializations (if any) are performed upon every entry to the block. Automatic variables are visible only within the block in which they are declared.
Here is a little more on
#include <stdio.h>
int main()
{
printf("Hello, world!\n");
} /* main() */
Several things to notice:
main()
There must be exactly one in every runnable program. Tricky note: functions without declared return types are assumed to returnint
. What does this function return? Typeecho $?
to the shell after running it and see! (More recent C standards specify thatmain
returns 0 if there is noreturn
statement. For other functions, using a return value from a non-void function that doesn'treturn
something produces undefined behavior. In older versions of C, thismain
function would return 14.)- Comments
using
/* ... */
//
-style comments were introduced later into C, but don't use them in this course.gcc
will let you use them, but, we'll stick to old-fashioned, native C comments. #include
Incorporates another file into this one. Include files, also known as header files, contain declarations that a program might find useful, typically the interface for a function or library.stdio.h
contains the interface for the standard I/O library. C programmers follow the convention
that header files end in - I/O in C is very different from C++, and generally
somewhat lower level. Most of the input and output you will do in C
will be at the level of individual characters or bytes. Instead of
doing input and output using the
<iostream>
C++ library with its convenient << operator and thecout
andcin
streams, you will use the <stdio.h> C library, with its less convenientprintf
,fgetc
andfgets
functions, etc. and thestdout
andstdin
file pointers. Printf uses a "format string" which is a kind of template — a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values. For instance,printf("This file is %d lines long and took %f hours to write", 120, 2.4)
will print "This file is 120 lines long and took 2.4 hours to write". - Curly brackets enclose a block, and a function body is
always a block. Variables are supposed to be declared at the
start of a block.
gcc
will let you declare variables anywhere (C++/Java style), but let's stick to the standard. This prepares you for how functions/blocks work when we start to look at assembly language programs. - String constants are enclosed in double quotes, but they
can't be freely concatenated as in Java or Python. Special
characters, like the newline (
'\n'
), are embedded in strings using the escape character (\
).
.h
.
#include <stdio.h>
#include <stdlib.h>
/*
* Convert U.S. Weight to Imperial and International
* Units. Print the results
*/
#define KILOS_PER_POUND .45359
#define POUNDS_PER_STONE 14
void print_converted(int pounds)
{
int stones = pounds / POUNDS_PER_STONE;
int uklbs = pounds % POUNDS_PER_STONE;
float kilos = pounds * KILOS_PER_POUND;
printf(" %3d %2d %2d %6.2f\n",
pounds, stones, uklbs, kilos);
} /* print_converted() */
int main()
{
int us_pounds;
printf(" US lbs UK st. lbs INT Kg\n");
for (us_pounds = 10; us_pounds < 250; us_pounds += 10)
print_converted(us_pounds);
return EXIT_SUCCESS;
} /* main() */
Note the use of #define
to create
compile-time constants. It is conventional to use lower case
for variables and UPPER CASE for compile-time constants. Tokens
beginning with #
are sometimes called
pre-processor directives: they are handled by a
separate program called the C pre-processor, cpp
,
that actually runs before the compiler sees your program. The pre-processor
step is one of the four main steps of compilation of a C program, which
we will learn about more in-depth later in the semester. For now, see below
for some quick examples of what compiling a C program looks like.
Declarations
The previous examples show a variety of data and function declarations. A declaration in C defines a name, and its properties. Some declarations only introduce type names, but most introduce a name for an actual object. The properties of object names include their type (e.g.,int
and Function
returning int
), the status of their storage
requirements, and their visibility.The type of an object implies a size, or the amount of storage the object requires (and you can ask for the size of any type using the
sizeof
operator). The other properties are determined
by a rather complicated set of rules. Data declarations can include
initializations, but there are some restrictions.Declaring a name with
extern
says that the compiler
should not allocate any space for it because the name is defined (and
its space is allocated) externally to this program file, i.e.,
it will come from somewhere else during program linking.
For example, the following declarations tell the compiler that there
will be an integer variable named num_donuts
and a
function named doh
supplied by
some module — this line does not set aside any space
for the variable:
extern int num_donuts;
extern int doh();
A definition in C causes the compiler to allocate space (or
produce code to allocate space) for something. The above lines
declare, but do not define, a variable
named num_donuts
and a function named doh
.
You've seen function declarations like this before.All names in C must be declared before they are used. A declaration that occurs before the definition of a function or variable (i.e., a prototype which does not allocate space or specify a function body) is called a forward declaration. This is useful for organizing input files with utility functions down at the bottom and for mutually recursive definitions.
Top-level definitions (definitions outside of any function body) described as
static
are only visible in the current
file: they cannot be linked to by other code modules. (They are
statically allocated whether so labelled or not.) Top-level
initializations must use constant (compile-time computable) data.Normal non-top-level declarations (inside function bodies) allocate space on the stack, and are called automatic. This space is deallocated on exit from the function or program block in which the declaration occurs. Initializations (if any) are performed upon every entry to the block. Automatic variables are visible only within the block in which they are declared.
Here is a little more on
extern
and static
declarations.
Compiling a C Program
While the act of compiling a C program has some similarities to
compiling C++, there are a couple important differences.
To compile a single, standalone source code file called
And then we link the object files together:
- Compiler: The compiler that you will use in Comp 40
is
gcc
, as opposed tog++
orclang++
that are used for C++ code. - File Extension: C source code files conventionally end
with
.c
(but keep in mind that header files still end with.h
).
-Wall
, -Wextra
, etc.)
Here are some quick examples of compiling some C projects. These
should look similar in some ways to what it would look like for a C++
project.To compile a single, standalone source code file called
hello.c
:
gcc -Wall -g -o hello_world hello.c
This will produce an executable called hello_world
.
Now imagine we had two files, A.c
and B.c
that work together. First we compile them to object files:
gcc -g -c -o A.o A.c;
gcc -g -c -o B.o B.c;
(Note: -c
tells gcc
to compile
only and generate a .o
file rather than try to
make an executable program.)And then we link the object files together:
gcc -Wall -g -o my_program A.o B.o
which produces an executable my_program
.
C Base Types
C only supports four built-in data types:
Floating point numbers (float or double) are
probably the same as whatever you are used to. We will discuss floating point
numbers at the low level later in the course.
Characters are assumed to be unsigned, one-byte, integers. Character constants come between single quotes, e.g.,
You will notice some omissions: There are no strings or boolean values. There is special support for string constants, but don't expect them to work the way they do in C++! We will discuss “C strings” in greater detail later.
The C99 standard introduced
In C,
The type name
In C,
If a function prototype contains the argument list
int
An integer that is the size of the local machine's word.float
Single precision floating point number.double
Double precision floating point number.char
A character is one byte in size. More recent C standards include wide characters, but we probably won't see much of them. In fact, a character is really just a special size of integer from C's point of view.
short
, long
, and
unsigned
flavors. Integer constants look normal,
most of the time. 123L
forces the integer to be
long
. Integer constants beginning with a
0
are interpreted as being in octal, which
is base 8. Integer constants that begin with a
0x
or 0X
are interpreted as
hexadecimal (base 16) numbers. An unsigned integer is one that
is meant to take on positive variables, i.e. for a size. We'll learn much
more about the difference between signed and unsigned integers later in
the course.
Caution: The treatment of leading zeros can sometimes be
confusing. For example, when initializing a table of
identification numbers (maybe part numbers in an inventory or
employee numbers for a payroll program), it can be tempting to
make all the numbers the same size for readability. But
10 is not the same as 010
(the latter is the same as 8 .
|
Characters are assumed to be unsigned, one-byte, integers. Character constants come between single quotes, e.g.,
'a'
, and include a variety of escape
sequences for special characters
like
'\n'
for newline. Character constants are automatically
promoted to the integer value of the corresponding character
in the ASCII sequence. This gives meaning to character comparisons
such as ((c > 'a') && (c < 'z'))
. It is also relied
on for various input and output operations as we'll see later. It also
means that you can do arithmetic on characters: ('z' - 'a') == 25
if we're using ASCII.You will notice some omissions: There are no strings or boolean values. There is special support for string constants, but don't expect them to work the way they do in C++! We will discuss “C strings” in greater detail later.
The C99 standard introduced
stdbool.h
which you
can #include
. It defines the type
name bool
as well as true
and false
. However, you still have to understand
the following.In C,
0
is false; all other numeric values are
considered to be true. Boolean operators, like
&&
return 0
for false and
1
for true. And if
is treated as if
it compared its test value against zero:
if (test) ...
is the same as
if ((test) != 0) ...
It's common (unfortunately) to use this property of integers,
integer expressions, and pointers (though the case with
pointers is a bit complicated under the covers):
while (things_to_do) {
...
things_to_do--;
}
Modern usage discourages this shorthand for
clearer formulations:
while (things_to_do > 0) {
...
things_to_do--;
}
There is no string type in C, though there
are string constants. As we shall see below, a string is the
same as an array of characters (bytes) terminated by the null
character '\0'
(in ASCII called NUL
)
whose value is guaranteed to be zero. The '\0'
acts as a sentinel character, that is, starting at the first
character in a string, adjacent bytes will be treated as part
of the same string until a '\0'
is in the array.
For example, the string
"foo"
represents a sequence of four characters:
'f'
, 'o'
, 'o'
,
'\0'
. If you write a string constant, the
terminating '\0'
is inserted for you by the
compiler. When you manipulate strings, however, you need to
be aware of its existence. (Note: The ASCII NUL
character means '\0'
. It is not the same as
the NULL
, with two L's, pointer. These are
different.)void
The type name void
has three distinct uses. In C,
void
as the return type of a
function means that the function is only to be called for side
effect and must not be used in a context where a return value
is expected.If a function prototype contains the argument list
(void)
, then the function takes no arguments
and the compiler will not let you supply any. This is
different from an empty argument list in a function prototype
which is taken to mean 'I don't want to say what the arguments
are, allow any number and trust me.'void
as the target of a pointer type has
another meaning, which we'll discuss below.
Derived/Aggregate Data Types
C provides 4 derived/aggregate data types:
C guarantees that array elements are stored in successive memory locations (which is important when we talk about pointers and arrays). Also, C has no built-in support for array bounds checking.
There are no multi-dimensional arrays, as such, in C: one simply uses an array with an array at each element:
The following code fragment prints out the values in our
In C, the size of an array must be specified when its space is allocated, and its size cannot change. This sort of declaration for a string is therefore common when allocating a static area for a string. It is also common when creating a buffer into which characters will be placed for parsing/processing.
You can leave out an array size if you have an explicit initialization:
You can get the address of a variable by using the
There is a distinguished pointer,
One use of arrays and pointers is important to most C programs: in the argument list. C assumes that every
The following code uses two different mechanisms (both common in practice) to print out the command line arguments passed to the program:
Thus, passing an integer array
Array elements are in successive memory locations, so if you know the address of one element, the address of the next one is the sum of the current address and the size of an array element. Further, arithmetic on pointers, including increment and decrement, adjusts the pointer by the size of the data pointed to. (You can find out the size of a datatype by using the
In the case of strings, which are sequences of characters, it is conventional to use the declaration
Note: Declaring an array allocates space for the array; declaring a pointer to a type only allocates space for the pointer.
This program also illustrates how a string constant can be used like an array and how array subscripting is commutative! That latter property is a consequence of the fact that C defines array subscripting in terms of address arithmetic. Given the following code:
Don't use
Refer to the code in
The fact that the array and pointer cases are different makes sense when you consider the storage allocation issues. Declaring a structure component that is an array allocates the space for the elements in the structure. This is possible, because you told the compiler how many slots there are and how big each slot is. A pointer declaration only tells the compiler there is a pointer, so only space for the pointer is allocated. An implication of this is that a pointer in a structure implies that memory will be allocated somewhere else for that component.
By default, the first enumerated name has the value 0, and each successive name is one more than the previous one. This can be changed through explicit assignment:
A very common definition among C programmers is:
No name may be defined in common between two enumerations.
- Arrays
- Pointers
- Structures
- Unions
- Enums
Arrays
An array is a sequence of values of a specified type:declares a sequence of 256 integers indexed from#define MAX_PARTS 256
/* ... */
int part_numbers[MAX_PARTS];
0
through 255
.C guarantees that array elements are stored in successive memory locations (which is important when we talk about pointers and arrays). Also, C has no built-in support for array bounds checking.
There are no multi-dimensional arrays, as such, in C: one simply uses an array with an array at each element:
double soil_samples[100][500]
is an array of 100 elements. Each of those elements is itself
an array of 500 double precision floating point elements.The following code fragment prints out the values in our
soil_samples
matrix with one row (of 500
elements) per line:
for (row = 0; row < 100; row++) {
for (col = 0; col < 500; col++)
printf("%f\t", soil_samples[row][col]);
printf("\n");
}
Using array notation, you can declare a variable that represents
a string of some maximum length like this:
char pathname[MAX_PATH_LEN];
In C, the size of an array must be specified when its space is allocated, and its size cannot change. This sort of declaration for a string is therefore common when allocating a static area for a string. It is also common when creating a buffer into which characters will be placed for parsing/processing.
You can leave out an array size if you have an explicit initialization:
char hello[] = "hello";
static int backward_digits[]
= {9, 8, 7, 6, 5, 4, 3, 2, 1, 0}
Arrays cannot be passed to or returned from functions,
though pointers to them can (more on pointers below). That
is, if backward_digits is an int array as initialized above,
and you pass it to a
function reverse_array
, reverse_array
receives a pointer to the beginning of the array. As
reverse_array
receives a pointer to the original
array, and not a copy, it can modify the array it is passed.
/*
* Modifies arr with length arr_len to have its elements stored
* in reverse order
*/
void reverse_array(int arr[], int arr_len) {
int i;
int tmp;
for (i = 0; i < arr_len / 2; i++) {
tmp = arr[i];
arr[i] = arr[arr_len - i - 1];
arr[arr_len - i - 1] = tmp;
}
}
int main() {
int backwards_digits[] = {9, 8, 7, 6, 5, 4, 3, 2, 1, 0};
int i;
reverse_array(backwards_digits, 10);
for (i = 0; i < 10; i++) {
printf("%d\n", backwards_digits[i]);
}
return 0;
}
Pointers
A pointer value is a memory address. Pointers are a primitive kind of data value and are central to programming in C, though they have a derived type (i. e., the type of pointer depends on the type of thing the pointer refers to). Here is the declaration of a variable whose type is pointer-to-integer, i. e., this variable can contain a pointer value that is the address of a variable that contains an integer value.int *int_ptr;
You get the value pointed to by such a variable by
dereferencing the pointer. So int_ptr
contains a pointer to an integer (address of an integer), and
*int_ptr
contains the integer it points to. We can
assign to pointer variables, and until you get very used to
this, I recommend that you draw pictures to represent your
pointer structures. For example, int_ptr = p;
does not change the memory the location int_ptr
was pointing to: it makes int_ptr
point to some
new place in memory. *int_ptr = 3;
changes the
contents of the memory location whose address is stored in
int_ptr
.You can get the address of a variable by using the
&
operator.
int size;
int *p = &size;
Look at
pointer_example.c
.
Declarations of pointers are a source of confusion to
students at first. The logic is that type (and optional
storage class) appears first followed by a list of variable
expressions. The variable expressions use the declared
variable in an expression that shows how you use it to get to
the declared type. Thus, the declaration of int_ptr
above should be understood to mean “dereferencing
int_ptr
(*int_ptr
) will yield an
int
.” There is a distinguished pointer,
NULL
or
0
that is guaranteed not to point to any object.
So it is common to use this pointer to terminate lists or,
when used as a return value of a function, to indicate
failure. It is not guaranteed that the value of the null
pointer is zero, only that writing a constant 0
will be compiled to the null pointer (in fact,
NULL
is really #define
d to be
0
).NULL
is a perfectly legal pointer, but it is
an error to try to dereference it.One use of arrays and pointers is important to most C programs: in the argument list. C assumes that every
main()
is actually a function of two
arguments:
int main(int argc, char *argv[])
where argc
is the number of arguments (the argument count) specified
on the command line that invoked the program, and
argv
is an array (the argument vector) of strings,
one for each input argument. argv[0]
is the name
of the program being invoked, and argv[1]
up to
argv[argc-1]
are the rest. C also guarantees
that argv[argc]
is the NULL
pointer:The following code uses two different mechanisms (both common in practice) to print out the command line arguments passed to the program:
#include <stdio.h>
void print_args0(int argc, char *argv[])
{
int i;
for (i = 0; i < argc; ++i)
printf("%s ", argv[i]);
putchar('\n');
}
void print_args1(char *argv[])
{
while (*argv != NULL)
printf("%s ", *(argv++));
printf("\n");
}
int main(int argc, char *argv[])
{
printf("Method 1: ");
print_args0(argc, argv);
printf("Method 2: ");
print_args1(argv);
return 0;
}
The type pointer
to void
, written (void *)
, has
special meaning. It represents a pointer to an unknown type,
and C guarantees that any pointer may be converted to
a (void *)
and back without any loss of
information. This pointer type is typically
used as a kind of abstraction device: functions can take and
return pointers to an abstract data value without revealing to
a client what the value actually looks like. We will discuss
and use void pointers more later on in the course.
Pointers, Arrays, Strings, and Address Arithmetic
C guarantees and exposes certain low-level details about how memory and data structures are laid out. An unsubscripted use of an array name is interpreted as the address of the first array element (except thatsizeof()
will yield the
size of the array and not the size of the first element).Thus, passing an integer array
a
to a function
will convert a
from type array
of int
to type pointer
to int
. The formal parameter can thus be
declared either as int a[]
or int
*a
. However, array types are different from
pointer types.Array elements are in successive memory locations, so if you know the address of one element, the address of the next one is the sum of the current address and the size of an array element. Further, arithmetic on pointers, including increment and decrement, adjusts the pointer by the size of the data pointed to. (You can find out the size of a datatype by using the
sizeof()
operator.) The address of an
element of an integer array arr
with index
i
is therefore equivalent to arr + i
which has the value arr + (i * sizeof(int)).In the case of strings, which are sequences of characters, it is conventional to use the declaration
char *s;
to refer to a string, unless the size of the string or the
buffer containing the string is an issue. Note: Declaring an array allocates space for the array; declaring a pointer to a type only allocates space for the pointer.
int i;
char *s = "Hello!";
for (i = 0; s[i] != '\0'; i++)
s[i] = tolower(s[i]);
converts the string in s
to lower case. (Actually,
there is a subtle bug — we'll come back to that.) But so
does the function str_lower()
in the following program:
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
/* Convert s to lower case, updating in place. */
char *str_lower(char *s)
{
char *stmp = s;
while (*stmp != '\0') {
*stmp = tolower(*stmp);
stmp++;
}
return s;
}
int main(int argc, char *argv[])
{
char *oh = "Oh";
char hello[15] = "Hello";
char there[] = "there";
int idx = (argc > 2) ? atoi(argv[2]) % 5 : 1;
/* Don't use atoi() */
printf("%s, %s %s, %s%c\n",
oh, str_lower(hello), there,
(argc > 1) ? argv[1] : "you",
idx[".!?:;"]);
return EXIT_SUCCESS;
}
This program illustrates quite a few interesting things.
There are a variety of ways to declare and initialize arrays
of characters (strings). Interestingly, they are not all
exactly the same: str_lower(oh)
produces a
segmentation fault on my computer, but only under certain
circumstances. Why? (String literals are, in many
implementations, stored in read-only storage, and they cannot
therefore be modified. The C standard says that the result of
modifying a string literal is undefined, which means that
implementations can do whatever they like. String literals
are supposed to be immutable. In CS 40
parlance, its an “unchecked runtime error” to
modify one.)This program also illustrates how a string constant can be used like an array and how array subscripting is commutative! That latter property is a consequence of the fact that C defines array subscripting in terms of address arithmetic. Given the following code:
int index = 5;
char s[] = "Halligan";
The following C expressions are all equivalent:
'g';
s[index]; // this is the preferred notation- don't use the following
*(s + (index * sizeof(char)));
*((index * sizeof(char)) + s);
index[s];
There are also two occurrences of the ternary operator for
conditional expressions, which has nothing to do with arrays
or pointers or strings, but is cool nonetheless.Don't use
atoi()
.
Structures
Structures in C are data aggregates, like arrays. However, there are two key differences: components have names rather than integer indices; and they are heterogeneous, i.e., they can contain values of different types.struct employee {
char name[MAX_NAME_LEN];
int number;
float salary;
};
is a structure of 3 fields: a string for the employee's name,
an integer for the employee number, and a floating point
number for their salary. One extracts or sets a field in a
structure with dot notation:
struct employee an_employee;
an_empoyee.number = 1234;
Declaring structures can seem a bit strange. The name of the
structure is optional, and these names inhabit a separate
namespace from other type names. If you leave out the name,
then you have an anoynmous structure definition. We can use
the employee structure above directly in a declaration (and we
could leave out the name):
struct {
char name[MAX_NAME_LEN];
int number;
float salary;
} an_employee;
or, given the named structure declaration above, we could
declare an array of employees:
struct employee sales_personnel[300];
You might like to use the name employee
as a
type, but you can't. That is why you will often see
declarations like the struct employee
followed
immediately by a type declaration
typedef struct employee employee_t ;
which allows employee_t
to be used like any type
name (it's a synonym
for struct employee
):
float give_raise(employee_t emp, float pct_incr);
You might think that, by analogy with arrays, a name of a
structure (e.g., emp
inside the
give_raise()
function above) would be equivalent
to a pointer to the first location of the structure. That
would be wrong. A structure has the same status as an integer
or a pointer in that a name (or expression) with a structure
type stands for the entire structure value. That means that
when you pass a structure to a function as an argument, the
bytes are copied into new storage (on the stack) for a new
structure that will be manipulated by the function. An
assignment of a structure to a structure variable similarly
copies the structure's contents into the structure denoted by
the variable on the left of the assignment. Refer to the code in
struct_test.c
. This
program demonstrates that a structure that contains a
fixed-sized array is not considered the same as one that
contains a pointer to the array's type and that structure
values are copied on assignment or when passed as arguments to
a function. The fact that the array and pointer cases are different makes sense when you consider the storage allocation issues. Declaring a structure component that is an array allocates the space for the elements in the structure. This is possible, because you told the compiler how many slots there are and how big each slot is. A pointer declaration only tells the compiler there is a pointer, so only space for the pointer is allocated. An implication of this is that a pointer in a structure implies that memory will be allocated somewhere else for that component.
Programming with Structures
The Big Ideas of abstraction, modularity, and divide/conquer/glue are always applicable; in C you must work extremely hard to use them well. One can think of a structure as a way to glue data values together, but this is too simple. They are that, but you should think of them as a basis for a data abstraction. Any structure that is central to your program (used by more that one module) and is not used merely as a way to pass arguments and return results should have a set of functions for manipulating the structure defined in a.c
file that can be separately
compiled. Then you'll want a .h
file containing
the interface specification. In general, the actual structure
type should be hidden, and we'll see some examples of how to
do this in the class.
Enumerations
Enumerations are useful for declaring a set of named constants. They are often used in place of#define
d constants. For example,
enum code_quality {good, bad, ugly} my_program;
declares the variable my_program
to have the type
enum code_quality
. I can then test and assign
any one of the enumerated names to the variable:
my_program = ugly;
C exposes the fact that the listed identifiers are actually
integers (whose size is implementation-dependent), and they
may be freely used in integer contexts (testing them for
equality is a natural place). The above declaration therefore
defines 5 names: the type enum code_quality
; the
integer variables good
, bad
, and
ugly
; and the variable my_program
of
type enum code_quality
.By default, the first enumerated name has the value 0, and each successive name is one more than the previous one. This can be changed through explicit assignment:
enum code_quality {good = 1, bad, ugly = 0} my_program;
The above code creates an enumeration data type in which
good
has the value 1
,
bad
has the value 2
(if there is no assigned
value, it is one more than the previous value), and
ugly
has the value 0
.A very common definition among C programmers is:
typedef enum {FALSE = 0, TRUE = 1} bool;
which allows us to declare a variable or return type to
be bool
.No name may be defined in common between two enumerations.
Function Call and Return
Syntactically, a function call is an expression or statement
in which an expression is followed by zero or more
comma-separted expressions in parentheses. The easy case is
when we refer to a function by
name:
In order to alter the values of the caller's variables, we need call-by-reference parameters, which we hand code in C using pointers. That is, we manually pass in pointers to the caller's variables to functions (often using the & operator). Call-by-reference parameters may also be used for both input and output, and so the same technique is used in C to for multiple value return. In the above example, one can view the reference parameters as being used for both input and output.
factorial(n)
.
Here, factorial
is an identifier that,
presumably, is the name given to a defined function. The
value n
is the argument or actual
parameter.
exprfun
Because the values of the arugments are copied onto the stack,
C is a call-by-value language. This means that
variables in the function are stored in different locations
than variables in the caller: no function can directly refer
to the local/stack-allocated variables in another function. (
expr1, ...)
- exprfun is evaluated. It must produce a function pointer (the address of the first instruction in the compiled function definition).
- The argument expressions expri are evaluated left-to-right.
- Create a new stack frame or activation record on the stack with enough space for the return address (the address of the instruction to be executed after the function completes), the argument values, the return value, and anything else the compiler needs to put there.
- Fill in the stack frame by copying the values of the arguments, etc.
- The function code is run with the new stack frame by updating the stack pointer and jumping to the first instruction of the function.
- If and when the function the returns, its return value is placed on the stack where the caller can extract it and then the code jumps to the return address saved during the function call.
- The return value (if one) is used as the value of the function and the stack frame is destroyed.
In order to alter the values of the caller's variables, we need call-by-reference parameters, which we hand code in C using pointers. That is, we manually pass in pointers to the caller's variables to functions (often using the & operator). Call-by-reference parameters may also be used for both input and output, and so the same technique is used in C to for multiple value return. In the above example, one can view the reference parameters as being used for both input and output.
add_and_multiply.c
(shown below) illustrates the use of multiple reference
parameters for output. This overcomes the restriction that in
C, a function must return either no values (it's return type
is void
) or one value.
#include <stdio.h>
int add_and_multiply(int x, int y, int * prod_p) {
*prod_p = x * y;
return x + y;
}
int main() {
int sum;
int prod;
sum = add_and_multiply(5, 7, &prod);
printf("5 + 7 = %d, 5 * 7 = %d", sum, prod);
return 0;
}
Assigning and casting pointers
The explanation above of pointers leaves out
some important techniques that you will use often, and
especially when allocating
memory. Specifically, you will often
need to convince C that the type of a pointer has
changed. One common example mentioned briefly above is converting a pointer to or
from
There are two main ways of converting pointers: assigning to a different pointer type, and casting. The sections below describe both of these techniques.
Passing arguments to functions follows the same rules. So:
By the way, there's no obvious reason to use
void *
, but other conversions are possible
too. There are two main ways of converting pointers: assigning to a different pointer type, and casting. The sections below describe both of these techniques.
Pointer assignment and conversion
The following code is not legal in C:int *ip;
float *fp;
/* ...assume ip1 has been initialized ... */
/* WON'T COMILE! The compiler assumes that if ip points
to memory containing an integer then treating
that same memory as a float is a mistake */
fp = ip1;
However, and perhaps surprisingly, the following
code is legal C:
void *vp;
int *ip1;
int *ip2;
float *fp;
/* ...assume ip1 has been initialized ... */
/* the following causes ip2 to point to the same
integer as ip1 */
vp = ip1; /* can assign void * from any ptr type */
ip2 = vp; /* can assign any ptr type from void */
/* The following also will compile and run, but
is very unlikely to be correct! Originally
ip1 points to some integer. Once the following
code runs, fp points to the same memory, but
assumes it's a float. That's almost always
a mistake, but the compiler won't catch it! */
fp = vp;
The above shows that one can assign any pointer
type to a void pointer, and vice versa. Furthermore, as shown
on the last line, this allows us to make a mistake equivalent
to the one that was caught in the earlier example. We've
effectively assigned an int pointer to a float pointer. Very
rarely there are good reasons for doing such things; usually
it's an error, and the compiler won't catch it!Passing arguments to functions follows the same rules. So:
void takes_intp(int *ip_parm);
void takes_voidp(void *vp_parm);
void *vp;
int *ip1;
int *ip2;
float *fp;
takes_intp(ip1); /* works (of course!) */
takes_intp(vp); /* works */
takes_intp(fp ); /* doesn't compile */
takes_voidp(ip1); /* works */
takes_voidp(vp); /* works (of course!) */
takes_voidp(fp ); /* works */
Casting pointers
In some of the examples above, the type of a pointer was succesfully changed by an assignment or when passing to a function. Another way to change the type of a pointer is with a cast. The syntax of a cast in C involves putting the desired type in parenthesis ahead of the expression to be cast:(type)expression
The cast does not permanently affect the type or
contents of the pointer; casting can be thought of as a
unary operator (similer to -
or
sizeof). The cast operator takes in a
pointer and returns a pointer to the same spot in memory,
but with the desired type.
For example:
(int *)vp;
Here's how that might be used in a C program:
void *vp;
int i = 3;
vp = &i; /* address as void * */
printf("The integer is %d\n", *((int *)vp));
This prints:
The integer is 3
The cast is the int *
in the argument
to printf
. Again, the vp
variable is not
changed by the cast in the printf
; it is just
treated at that spot as having the type int *
instead
of void *
. Thus, when it's dereferenced this way:
*((int *)vp
, the result is an int
that
can be printed.By the way, there's no obvious reason to use
void *
in the code above. This simple example
would work better with int *
and no cast needed:
int *ip;
int i = 3;
vp = &i; /* address as int * */
printf("The integer is %d\n", *ip);
...but the first example let's us see casting in
action.
Aside: Numeric casts
In this part of the tutorial we are focused mainly on pointers but it's worth noting that casts can also be used with numeric types:#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int i = 3;
printf("Floating division: %f Integer division: %d\n",
(float)i / 2, i /2);
return EXIT_SUCCESS;
}
This program prints:
Floating division: 1.500000 Integer division: 1
Memory allocation
In C++ new objects are created with the
The model in C is very different: you don't construct objects, instead you allocate memory. You have to write code that runs after the allocation to do any necessary initialization, and you must manually handle any typing that is involve. This program allocates 1,000,000 bytes of memory, has a comment in place of code that might use the memory, and then frees the memory:
Often you will want to allocate memory for a structure. Here's an example of how to do it. Indeed, this code makes two allocations: both of them do the same thing, but they illustrate two different ways of specifying the size to be allocated.
new
operator. That operator allocates memory, but it does much
more: it calls the constructor method on every member of the
class being created, and on the class itself. As
discussed above, C does not have
classes at all. It has struct
, but those don't
have constructors or in fact any methods.The model in C is very different: you don't construct objects, instead you allocate memory. You have to write code that runs after the allocation to do any necessary initialization, and you must manually handle any typing that is involve. This program allocates 1,000,000 bytes of memory, has a comment in place of code that might use the memory, and then frees the memory:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 1000000
int main(int argc, char **argv)
{
char *p;
/* Attempt to allocate SIZE bytes of memory */
p = malloc(SIZE);
/* See if it worked */
if (p == NULL) {
fprintf(stderr, "Failed to malloc %d bytes of memory\n", SIZE);
return EXIT_FAILURE;
}
/* Print message showing address of allocated memory */
fprintf(stderr, "Successfully allocated %d bytes of memory at address %p\n",
SIZE, p);
/* .... use the memory here */
/* free the memory */
free(p); /* free does not returne a value */
return EXIT_SUCCESS;
}
If you don't free the memory, then your
program leaks memory. That is, it leaves memory in
use. Although the operating system will tend to clean up after
programs that leak, you should get in the habit of freeing
everything you allocate! In CS 40, your grade will
suffer significantly if we find the your code
leaks. The valgrind
tool is an excellent way to
find leaks. Another hint: when you are developing code, as
soon as you code a malloc
, code the
corresponding free
! malloc
returns a void *
. Using the
techniques should above you can easily assign that to a typed
pointer (may be a char *
, for example, if you are
planning to store characters), or you can cast the pointer.Often you will want to allocate memory for a structure. Here's an example of how to do it. Indeed, this code makes two allocations: both of them do the same thing, but they illustrate two different ways of specifying the size to be allocated.
#include <stdio.h>
#include <stdlib.h>
struct student {
char name[20];
int age;
float gpa;
};
int main(int argc, char **argv)
{
struct student *student1_p;
struct student *student2_p;
/* Allocate space for a student */
student1_p = malloc(sizeof(struct student));
if (student1_p == NULL) {
fprintf(stderr, "Failed to malloc %u bytes of memory for student 1\n",
sizeof(struct student));
return EXIT_FAILURE;
} else {
fprintf(stderr, "Successfully allocated %u bytes for student 1 at address %p\n",
sizeof(struct student), student1_p);
}
/* Norman Ramsey prefers this way, because you always get the
right size for whatever the declaration was for student2_p */
student2_p = malloc(sizeof(*student2_p));
if (student2_p == NULL) {
fprintf(stderr, "Failed to malloc %u bytes of memory for student 2\n",
sizeof(*student2_p));
return EXIT_FAILURE;
} else {
fprintf(stderr, "Successfully allocated %u bytes for student 2 at address %p\n",
sizeof(*student2_p), student2_p);
}
/* ...do some work here */
free(student1_p);
free(student2_p);
return EXIT_SUCCESS;
}
}
Here's an example of output from that program:
Successfully allocated 28 bytes for student 1 at address 0x166f010
Successfully allocated 28 bytes for student 2 at address 0x166f040