This document collects some idiomatic examples of the C way of doing things. None of these examples have been tested. Please report errors or difficulties to comp40-staff.
The idea is to separate out the processing of an open file handle from the process of finding one or more open file handles. Idiom adapted from Kernighan and Ritchie, page 162, for a program without options:
extern void do_something(FILE *);
int main(int argc, char *argv[]) {
if (argc == 1) {
do_something(stdin);
} else {
for (int i = 1; i < argc; i++) {
FILE *fp = fopen(argv[i], "r");
if (fp == NULL) {
fprintf(stderr, "%s: Could not open file %s for reading\n",
argv[0], argv[i]);
exit(1);
}
do_something(fp);
fclose(fp);
}
}
}
Idiom for OS-specific error message on failing to open a file:
perror(argv[i]); // print the filename with message about *why* fopen() failed
Idiom for elaborate error messsage:
...
fprintf(stderr, "%s: could not open %s (%s)\n",
argv[0], argv[i], strerror(errno));
The correct idiom for reading input is to
Allocate a buffer
Call fgets
Allocation may be static or dynamic. The main issue is how to recover if fgets does not return an entire line. Assuming you can't just halt the program with an error message, these are your options:
If your buffer is dynamically allocated, you can enlarge it and continue to read.
If your buffer is statically allocated, you should
Here are some things to avoid when doing input one line at a time:
Never use gets; it's unsafe.
Don't use scanf, especially not for interactive programs. It's too easy for scanf to become greedy and gobble up more than one line, especially if the input doesn't meet specifications.
If you have the urge to use the scanf interface, which can be quite useful, use fgets to read the line and then sscanf (not the extra 's') to read the pieces.
Gotcha alert! The C++ strings you are used to do not exist in C. In C, you simulate a string by a char * pointer, which points to a sequence of bytes ending in '\0'. This style causes all sorts of problems, most notably
const char *s1, *s2; // strings in the neighborhood
if (s1 == s2) { // SILENTLY GIVES WRONG ANSWERS
... // go here only if *pointers* are identical
}
The standard idiom for comparing strings is
if (strcmp(s1, s2) == 0) {
... // strings are equal here
}
The seasoned C programmer often writes
if (!strcmp(s1, s2)) {
... // strings are equal here
}
The exclamation mark is easy to overlook.
Problem: comparing equal strings costs time proportional to the length of the string.
Hanson's Atom_new or Atom_string functions use a shared hash table to ensure that equal strings are represented by identical pointers. A single Atom_new is more expensive than a single strcmp, but when you are using strings in data structures, you will recover the cost by saving comparisons down the road. You may also save memory. One idiom is
const char *s1, *s2; // strings in the neighborhood
s1 = Atom_string(s1); // hash string to a unique pointer
s2 = Atom_string(s2); // likewise
...
if (s1 == s2) {
... // strings are guaranteed equal
}
The behavior is neatly expressed mathematically:
Atom_string(s1) == Atom_string(s2)if and only ifstrcmp(s1, s2) == 0.
You use Atom_new if you want to create atomic strings that contain zeroes, as you might in some binary network protocols.
This idiom is notable for its simple control flow. It can be generalized to other separators besides commas and other things to print besides strings.
...
const char *prefix = "";
for (int i = 0; i < nthings; i++) {
printf("%s%s", prefix, things[i]);
prefix = ", ";
}
List_map to print strings separated by commasFor a list we'd like to write
const char *prefix = "";
foreach name in list { // the iteration abstraction does not exist in C
printf("%s%s", prefix, name);
prefix = "\n";
}
Using the List_map interface, name will be pointed to by a parameter, and prefix will have to be stored in the "closure" state which persists across iterations.
Here's the state:
struct inner_state {
const char *prefix;
};
And here's the apply function:
static void inner_apply(void **x, void *cl) {
struct inner_state *s = cl;
char *name = *x; // x is &p->first, so *x is p->first
printf(cl->prefix, name);
}
Now we rewrit the code to assign the initial state and then call List_map in place of the loop:
struct inner_state s;
s.prefix = "";
List_map(list, inner_apply, &s);
The indirections look like this:
p->first points to the sequence of characters "hello"
&p->first points to p->first
the void **x that is passed to inner_apply is &p->first
*x has type void * and is p->first, so we can assign it to char *name without a cast
Idiom #1: Hanson style (the type abbreviation includes a pointer):
typedef struct foo *Foo;
...
Foo f;
Idiom #2: Bell Labs style (the type abbreviation does not include a pointer):
typedef struct foo foo;
...
foo *f;
Poison: mixing the two styles!
Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.
Many abstractions require memory allocated dynamically on the heap. To use such abstractions correctly, without leaking memory, you must balance every allocation with a free. Here is a recipe:
Foo_T foo = Foo_new(...arguments...);
assert(foo != NULL);
... operations on foo, including calling functions that use foo ...
Foo_free(&foo); // free *foo and set foo to NULL
void * values of known typeSuppose we are using qsort to sort an array of strings, where each string is represented by a char * pointer. No sane person wants to deal with typecasting, so we write the comparison function this way:
static int compare(const void *p1, const void *p2) {
const char * const *ps1 = p1;
const char * const *ps2 = p2;
return strcmp(*ps1, *ps2);
}
The idiom is that if you have a void * value of known type, you immediately assign it to a variable of that type. No explicit cast is needed. Here's another example:
struct foo *p = Array_get(a, i);
Suppose array a is a Hanson Array_T containing values of type struct pixel.
There are two ways to get an expression containing the ith element:
The Hanson approach:
assert(Array_size(a) == sizeof(struct pixel)); // detects some errors
... *(struct pixel *)Array_get(a, i) ... // use expression of struct type
The Ramsey approach:
struct pixel *p = Array_loc(a, i); // capture pointer into the array
// (valid until resized or freed)
assert(sizeof(*p) == Array_size(a));
... *p ... // use expression of struct type
Both approaches are valid. The Ramsey approach is more verbose but is also robust against changes in type. Why? The Ramsey approach has a single point of truth, that is, the type is mentioned in only one place. In the Hanson approach, the type is mentioned in multiple places, and it is easier to write inconsistent code.
Suppose function f() returns a value of type struct pixel you want to store in
one of Hanson's arrays. There are two ways to proceed.
The Hanson approach:
struct pixel pix = f(); // note f() does not return a pointer
assert(Array_size(a) == sizeof(pix)); // detects some errors
Array_put(a, i, &pix); // note use of C address-of operator '&'
The Ramsey approach:
struct pixel *p = Array_loc(a, i); // capture pointer into the array
// (valid until resized or freed)
assert(Array_size(a) == sizeof(*p)); // detects some errors
*p = f();
As before, both approaches are valid.
Here I initialize and use an array of arrays, mixing the Ramsey and Hanson idioms.
Suppose you want an Array_T of Array_T of double:
Array_T outer = Array_new(n, sizeof(Array_T)); // n elements of type Array_T
for (i = 0; i < n; i++) {
Array_T inner_array = Array_new(length_of_row(i), sizeof(double));
// variable number of elements in each inner array
for (j = 0; j < Array_length(inner_array), j++) {
double d = 0.0; // initial value of element
Array_put(inner_array, j, &d); // Array_put gets *pointer* to element
}
Array_put(outer, i, &inner_array); // again, put gets *pointer* to element
}
Now to access element (i, j), I remember that Array_get returns a pointer to the
element type:
Array_T *p = Array_get(outer, i);
assert(sizeof(*p) == Array_size(outer));
Array_T inner_array = *p;
double *q = Array_get(inner_array, j);
assert(sizeof(*q)) == Array_size(inner_array);
return *q;
It can be difficult to cram a long printf or fprintf call into 80
columns. Exploit this unusual property of C: adjacent string
literals are concatenated at compile time.
Example:
fprintf(stderr, "%s: Things have gone horribly wrong: "
"%s is a file format I don't recognize, "
"I can't find any bytes on standard input, "
"and the dog ate %d pages of my homework!\n",
argv[0], argv[1], n-1);
Note especially there are no commas between the literals.
This idiom also enables many scurvy tricks with the C preprocessor.