Idioms for C programmers

This document collects some idiomatic examples of the C way of doing things. None of these examples have been tested. Please report errors or difficulties to comp40-staff.

Table of Contents

Reading from standard input or from one or more files named on the command line

The idea is to separate out the processing of an open file handle from the process of finding one or more open file handles. Idiom adapted from Kernighan and Ritchie, page 162, for a program without options:

extern void do_something(FILE *);

int main(int argc, char *argv[]) {
  if (argc == 1) {
    do_something(stdin);
  } else {
    for (int i = 1; i < argc; i++) {
      FILE *fp = fopen(argv[i], "r");
      if (fp == NULL) {
        fprintf(stderr, "%s: Could not open file %s for reading\n",
                        argv[0], argv[i]);
        exit(1);
      }
      do_something(fp);
      fclose(fp);
    }
  }
}

Idiom for OS-specific error message on failing to open a file:

perror(argv[i]);  // print the filename with message about *why* fopen() failed

Idiom for elaborate error messsage:

...
fprintf(stderr, "%s: could not open %s (%s)\n",
        argv[0], argv[i], strerror(errno));

Reading one line of input

The correct idiom for reading input is to

  1. Allocate a buffer

  2. Call fgets

Allocation may be static or dynamic. The main issue is how to recover if fgets does not return an entire line. Assuming you can't just halt the program with an error message, these are your options:

Design poisons for line-oriented input

Here are some things to avoid when doing input one line at a time:

Comparing strings

Gotcha alert! The C++ strings you are used to do not exist in C. In C, you simulate a string by a char * pointer, which points to a sequence of bytes ending in '\0'. This style causes all sorts of problems, most notably

const char *s1, *s2;  // strings in the neighborhood

if (s1 == s2) { // SILENTLY GIVES WRONG ANSWERS
  ... // go here only if *pointers* are identical
}

The standard idiom for comparing strings is

if (strcmp(s1, s2) == 0) { 
  ... // strings are equal here
}

The seasoned C programmer often writes

if (!strcmp(s1, s2)) { 
  ... // strings are equal here
}

The exclamation mark is easy to overlook.

Problem: comparing equal strings costs time proportional to the length of the string.

Hanson's idiom for constant time string comparison

Hanson's Atom_new or Atom_string functions use a shared hash table to ensure that equal strings are represented by identical pointers. A single Atom_new is more expensive than a single strcmp, but when you are using strings in data structures, you will recover the cost by saving comparisons down the road. You may also save memory. One idiom is

const char *s1, *s2;  // strings in the neighborhood

s1 = Atom_string(s1); // hash string to a unique pointer
s2 = Atom_string(s2); // likewise

...

if (s1 == s2) { 
  ... // strings are guaranteed equal
}

The behavior is neatly expressed mathematically:

Atom_string(s1) == Atom_string(s2) if and only if strcmp(s1, s2) == 0.

You use Atom_new if you want to create atomic strings that contain zeroes, as you might in some binary network protocols.

Printing strings separated by commas

This idiom is notable for its simple control flow. It can be generalized to other separators besides commas and other things to print besides strings.

...
const char *prefix = "";
for (int i = 0; i < nthings; i++) {
  printf("%s%s", prefix, things[i]);
  prefix = ", ";
}

Using List_map to print strings separated by commas

For a list we'd like to write

const char *prefix = "";
foreach name in list { // the iteration abstraction does not exist in C
  printf("%s%s", prefix, name);
  prefix = "\n";
}

Using the List_map interface, name will be pointed to by a parameter, and prefix will have to be stored in the "closure" state which persists across iterations.

Here's the state:

struct inner_state { 
  const char *prefix;
};

And here's the apply function:

static void inner_apply(void **x, void *cl) {
  struct inner_state *s = cl;
  char *name = *x;  // x is &p->first, so *x is p->first

  printf(cl->prefix, name);
}

Now we rewrit the code to assign the initial state and then call List_map in place of the loop:

struct inner_state s;
s.prefix = "";
List_map(list, inner_apply, &s);

More detail on the indirections:

The indirections look like this:

p->first points to the sequence of characters "hello"

&p->first points to p->first

the void **x that is passed to inner_apply is &p->first

*x has type void * and is p->first, so we can assign it to char *name without a cast

Type abbreviations for structure types

Idiom #1: Hanson style (the type abbreviation includes a pointer):

typedef struct foo *Foo;
...

Foo f;

Idiom #2: Bell Labs style (the type abbreviation does not include a pointer):

typedef struct foo foo;
...

foo *f;

Poison: mixing the two styles!

Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.

Using an abstraction defined in an interface Foo

Many abstractions require memory allocated dynamically on the heap. To use such abstractions correctly, without leaking memory, you must balance every allocation with a free. Here is a recipe:

Foo_T foo = Foo_new(...arguments...);
assert(foo != NULL);

... operations on foo, including calling functions that use foo ...

Foo_free(&foo); // free *foo and set foo to NULL

Handling void * values of known type

Suppose we are using qsort to sort an array of strings, where each string is represented by a char * pointer. No sane person wants to deal with typecasting, so we write the comparison function this way:

static int compare(const void *p1, const void *p2) {
  const char * const *ps1 = p1;
  const char * const *ps2 = p2;
  return strcmp(*ps1, *ps2);
}

The idiom is that if you have a void * value of known type, you immediately assign it to a variable of that type. No explicit cast is needed. Here's another example:

struct foo *p = Array_get(a, i);

Using values stored in Hanson's arrays

Suppose array a is a Hanson Array_T containing values of type struct pixel. There are two ways to get an expression containing the ith element:

The Hanson approach:

assert(Array_size(a) == sizeof(struct pixel));  // detects some errors
... *(struct pixel *)Array_get(a, i) ...        // use expression of struct type

The Ramsey approach:

struct pixel *p = Array_loc(a, i);   // capture pointer into the array
                                     // (valid until resized or freed)
assert(sizeof(*p) == Array_size(a));
... *p ...        // use expression of struct type

Both approaches are valid. The Ramsey approach is more verbose but is also robust against changes in type. Why? The Ramsey approach has a single point of truth, that is, the type is mentioned in only one place. In the Hanson approach, the type is mentioned in multiple places, and it is easier to write inconsistent code.

Storing values into Hanson's arrays

Suppose function f() returns a value of type struct pixel you want to store in one of Hanson's arrays. There are two ways to proceed.

The Hanson approach:

struct pixel pix = f();  // note f() does not return a pointer
assert(Array_size(a) == sizeof(pix));  // detects some errors
Array_put(a, i, &pix);   // note use of C address-of operator '&'

The Ramsey approach:

struct pixel *p = Array_loc(a, i);   // capture pointer into the array
                                     // (valid until resized or freed)
assert(Array_size(a) == sizeof(*p)); // detects some errors
*p = f();

As before, both approaches are valid.

Combined example of Hanson's arrays

Here I initialize and use an array of arrays, mixing the Ramsey and Hanson idioms. Suppose you want an Array_T of Array_T of double:

Array_T outer = Array_new(n, sizeof(Array_T)); // n elements of type Array_T
for (i = 0; i < n; i++) {
  Array_T inner_array = Array_new(length_of_row(i), sizeof(double));
                          // variable number of elements in each inner array
  for (j = 0; j < Array_length(inner_array), j++) {
    double d = 0.0; // initial value of element
    Array_put(inner_array, j, &d); // Array_put gets *pointer* to element
  }
  Array_put(outer, i, &inner_array); // again, put gets *pointer* to element
}

Now to access element (i, j), I remember that Array_get returns a pointer to the element type:

Array_T *p = Array_get(outer, i);
assert(sizeof(*p) == Array_size(outer));
Array_T inner_array = *p;
double *q = Array_get(inner_array, j);
assert(sizeof(*q)) == Array_size(inner_array);
return *q;

Writing long string literals

It can be difficult to cram a long printf or fprintf call into 80 columns. Exploit this unusual property of C: adjacent string literals are concatenated at compile time.

Example:

fprintf(stderr, "%s: Things have gone horribly wrong: "
                "%s is a file format I don't recognize, "
                "I can't find any bytes on standard input, "
                "and the dog ate %d pages of my homework!\n",
                argv[0], argv[1], n-1);

Note especially there are no commas between the literals.

This idiom also enables many scurvy tricks with the C preprocessor.


Back to class home page