Idioms for C programmers

This document collects some idiomatic examples of the C way of doing things. None of these examples have been tested. Please report errors or difficulties to a href=mailto:comp40-staff@cs.tufts.educomp40-staff.

For the include statements, I've inserted a backslash before the statement, because the generator for this document deletes them otherwise. When you put include statements in our programs, don't put a backslash in.

Table of Contents

Reading from standard input or from one or more files named on the command line

The idea is to separate out the processing of an open file handle from the process of finding one or more open file handles. Idiom adapted from Kernighan and Ritchie, page 162, for a program without options:

\#include <stdlib.h>
\#include <stdio.h>

extern void do_something(FILE *fp);

int main(int argc, char *argv[])
{
        if (argc == 1) {
                do_something(stdin);
        } else {
                for (int i = 1; i < argc; i++) {
                        FILE *fp = fopen(argv[i], "r");
                        if (fp == NULL) {
                                fprintf(stderr, 
                                        "%s: %s %s %s\n",
                                        argv[0], "Could not open file ",
                                        argv[i], "for reading");
                                exit(1);
                        }
                        do_something(fp);
                        fclose(fp);
                }
        }
        return EXIT_SUCCESS;
}

Here's another approach:

\#include <stdio.h>
\#include <errno.h>
\#include <stdlib.h>

extern void do_something(FILE *fp);

static FILE *open_or_abort(char *fname, char *mode);

int main(int argc, char *argv[])
{
        if (argc == 1) {
                do_something(stdin);
        } else {
                for (int i = 1; i < argc; i++) {
                        FILE *fp = open_or_abort(argv[i], "r");
                        do_something(fp);
                        fclose(fp);
                }
        }
        return EXIT_SUCCESS;
}

static FILE *open_or_abort(char *fname, char *mode)
{
        FILE *fp = fopen(fname, mode);
        if (fp == NULL) {
                int rc = errno;
                fprintf(stderr,
                        "Could not open file %s with mode %s\n",
                        fname,
                        mode);
                exit(rc);
        }
        return fp;
}

Idiom for OS-specific error message on failing to open a file:

perror(argv[i]);  /* print the filename with message about *why* fopen() failed */

Idiom for elaborate error messsage:

\#include <errno.h>
\#include <string.h>
...
fprintf(stderr, "%s: could not open %s (%s)\n",
        argv[0], argv[i], strerror(errno));

Reading one line of input

The correct idiom for reading input is to

  1. Allocate a buffer

  2. Call fgets

Allocation may be static or dynamic. The main issue is how to recover if fgets does not return an entire line. Assuming you can't just halt the program with an error message, these are your options:

Design poisons for line-oriented input

Here are some things to avoid when doing input one line at a time:

Getting rid of "Unused variable" warnings

Sometimes a contract insists that a function have certain arguments, but in some implementation you may not use the arguments. It may be that you don't use argc and argv in main, for example. Comp 40 does not permit you to have compile time warnings, so what can you do? This:

int main(int argc, char *argv[])
{
        (void) argc;
        (void) argv;
        ...
}

Comparing strings

Gotcha alert! The C++ strings you are used to do not exist in C. In C, you simulate a string by a char * pointer, which points to a sequence of bytes ending in '\0'. This style causes all sorts of problems, most notably

const char *s1, *s2;  /* strings in the neighborhood  */

if (s1 == s2) {       /* SILENTLY GIVES WRONG ANSWERS */
        ... 
        /* go here only if *pointers* are identical */
        ...
}

The standard idiom for comparing strings is

if (strcmp(s1, s2) == 0) { 
        ... 
        /* strings are equal here */
        ...
}

Old-fashioned C programmers often write (but you should not)

if (!strcmp(s1, s2)) { 
        ... 
        /* strings are equal here */
        ...
}

The exclamation mark is easy to overlook, and the uncertain relationship between booleans, success/failure codes, and other finite conditions makes such code less clear. If the return value is intended as a boolean, then omit an explicit test; otherwise, put in an explicit test.

Problem: comparing equal strings costs time proportional to the length of the string.

Hanson's idiom for constant-time string comparison

Hanson's Atom_new or Atom_string functions use a shared hash table to ensure that equal strings are represented by identical pointers. A single Atom_new is more expensive than a single strcmp, but when you are using strings in data structures, you will recover the cost by saving comparisons down the road. You may also save memory. One idiom is

const char *s1, *s2;  /* strings in the neighborhood */

s1 = Atom_string(s1); /* hash strings to unique pointers */
s2 = Atom_string(s2);

...

if (s1 == s2) { 
        ... 
        /* strings are guaranteed equal */
        ...
}

The behavior is neatly expressed mathematically:

Atom_string(s1) == Atom_string(s2) if and only if strcmp(s1, s2) == 0.

You use Atom_new if you want to create atomic strings that contain zeroes, as you might in some binary network protocols.

Printing strings separated by commas

This idiom is notable for its simple control flow. It can be generalized to other separators besides commas and other things to print besides strings.

\#include <stdio.h>
...
const char *prefix = "";

for (int i = 0; i < nthings; i++) {
        printf("%s%s", prefix, things[i]);
        prefix = ", ";
}

Notice that to reuse this code, you'll have to reset prefix. If this code is in a function, it will get reset.

Using List_map to print strings separated by newlines

For a list we'd like to write

const char *prefix = "";

foreach name in list { /* the iteration abstraction does not exist in C */
        printf("%s%s", prefix, name);
        prefix = "\n";
}

Using the List_map interface, name will be pointed to by a parameter, and prefix will have to be stored in the "closure" state which persists across iterations.

Here's the state:

/* Loop state for print_list() function */
struct print_list_state { 
        const char *prefix;
};

And here's the apply function:

/*
 * Prints the elements of a list separated by new lines
 * Intended to be passed to List_map()
 * Need to reset prefix in loop state
 */
static void print_list_apply(void **list_node, void *loop_state_closure)
{
        struct print_list_state *state = loop_state_closure;
        char *name = *list_node;  
        /* 
         * NB:  list_node is &(list_node->first), 
         *      so *list_node is list_node->first 
         */

        /* First element isn't preceeded by newline; the rest are */
        printf("%s%s", state->prefix, name);
        cl->prefix = "\n";
}

Now we rewrite the code to assign the initial state and then call List_map in place of the loop:

struct print_list_state s;
s.prefix = "";

List_map(list, print_list_apply, &s);

Again, the prefix is still a newline after the call to List_map. A good idea is to write a print function that encapsulates the state structure and the call to List_map.

More detail on the indirections:

The indirections look like this:

If lst->first points to the sequence of characters "hello"

&(lst->first) points to lst->first

the void **list_node that is passed to print_list_apply is &(lst->first)

*list_node has type void * and is lst->first, so we can assign it to name without a cast

Idioms for void ** pointers

A void ** pointer almost always means "pass by reference a pointer to an unknown type. Here are a couple of idioms:

Type abbreviations for structure types

Idiom #1: Hanson style (the type abbreviation includes a pointer):

typedef struct foo *Foo;
...

Foo f;

Idiom #2: Bell Labs style (the type abbreviation does not include a pointer):

typedef struct foo foo;
...

foo *f;

Poison: mixing the two styles!

Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.

Using an abstraction defined in an interface Foo

Many abstractions require memory allocated dynamically on the heap. To use such abstractions correctly, without leaking memory, you must balance every allocation with a free. Here is a recipe:

Foo_T foo = Foo_new(...arguments...);
assert(foo != NULL);

... 
/* operations on foo, including calling functions that use foo */
...

Foo_free(&foo); /* free *foo and set foo to NULL */

Handling void * values of known type

Suppose we are using qsort to sort an array of strings, where each string is represented by a char * pointer. No sane person wants to deal with typecasting, so we write the comparison function this way:

static int compare(const void *p1, const void *p2)
{
        const char * const *ps1 = p1;
        const char * const *ps2 = p2;

        return strcmp(*ps1, *ps2);
}

The idiom is that if you have a void * value of known type, you immediately assign it to a variable of that type. No explicit cast is needed. Here's another example:

struct node *p = Table_get(t, Atom_string("root"));

Using unboxed arrays

Be alert that I have retired Hanson's Array abstraction. I have replaced it with `UArray', an abstraction of unboxed arrays. (The implementation is exactly what's in the book, but the interface is different.)

Suppose array a is an unboxed UArray_T containing values of type struct pixel. Here's how you get a pointer to the ith element:

The Ramsey approach:

struct pixel *p = UArray_at(a, i);   /* capture pointer into the array
                                      * (valid until resized or freed) 
                                      */
assert(sizeof(*p) == UArray_size(a));
... *p ...        /* use expression of struct type */

This idiom is robust against changes in type. Why? It has a single point of truth, that is, the type is mentioned in only one place. In Hanson's book, you will see types mentioned in multiple places, and it is easier to write inconsistent
code. That's one reason I've retired Hanson's arrays. (The other is that students found them hard to learn.)

Initializing array elements

Here's an idiom for initializing element i of an array with an empty Set_T:

Set_T *setp = UArray_at(array, i);
assert(sizeof(*setp) == UArray_size(array));
*setp = Set_new(10, NULL, NULL);

Storing values into an unboxed array

Suppose function f() returns a value of type struct pixel you want to store in an unboxed array. Here's how you do it:

struct pixel *p = Array_at(a, i);    /* capture pointer into the array
                                      * (valid until resized or freed)
                                      */
assert(Array_size(a) == sizeof(*p)); /* detects some errors */
*p = f();

Example of array of arrays

Here I initialize and use an array of arrays. Suppose you want a UArray_T of UArray_T of double:

/* n elements of type UArray_T */
UArray_T outer = UArray_new(n, sizeof(UArray_T));

for (i = 0; i < n; i++) {
        UArray_T inner_array = UArray_new(length_of_row(i), 
                                          sizeof(double));

        /* variable number of elements in each inner array */
        for (j = 0; j < UArray_length(inner_array), j++) {
                double *elemp = UArray_at(inner_array, j); 
                *elemp = 0..0; /* initial value of element */
        }

        /* point to slot for inner array */
        UArray_T *innerp = UArray_at(outer, i); 
        *innerp = inner_array;
}

Now to access element (i, j), I remember that UArray_at returns a pointer to an element:

UArray_T *p = UArray_at(outer, i);
assert(sizeof(*p) == UArray_size(outer));

UArray_T inner_array = *p;

double *q = UArray_at(inner_array, j);
assert(sizeof(*q)) == UArray_size(inner_array);

return *q;

Allocating memory

The following anti-idiom, although seen frequently in the code of those who have not been taught better, is anathema to C programmers:

Thing_T p = malloc(sizeof(struct Thing_T)); /* not acceptable */
assert(p != NULL);

Such code is not acceptable for COMP 40:

The correct way to write these allocations is with a single point of truth:

Thing_T p = malloc(sizeof(*p));  /* established C idiom */
assert(p != NULL);

This code is good because

This business of allocation and deallocation is so tricky that we recommend you use Hanson's macros:

Thing_T p;
NEW(p); // the assertion is included in NEW

Writing long string literals

It can be difficult to cram a long printf or fprintf call into 80 columns. Exploit this unusual property of C: adjacent string literals are concatenated at compile time.

Example:

fprintf(stderr, "%s: Things have gone horribly wrong: "
                "%s is a file format I don't recognize, "
                "I can't find any bytes on standard input, "
                "and the dog ate %d pages of my homework!\n",
                argv[0], argv[1], n - 1);

Note especially there are no commas between the literals.

This idiom also enables many scurvy tricks with the C preprocessor.

Printing integers of known width

The designers of C, unlike the designers of Java, decided that programmers did not need to know how many bits are in a type like int or unsigned long. As a result, it is nearly impossible to write code that is portable against changes in the size of a machine word.

This problem was fixed in C99 with the introduction of the <inttypes.h> interface, which contains a variety of .integer types and some print macros. Most of you will need it only for printing. Here are some examples of the C99 idiom for printing integers of known sizes:

\#include <stdio.h>
\#include <stdlib.h>
\#include <inttypes.h>

int main() 
{
        uint64_t big     = (uint64_t)1 << 63;
        int16_t negative = ~(int16_t)0;

        printf("%" PRIu64 " is a large number, as we can see by its"
               " hex\nrepresentation 0x%016" PRIx64 ".\n%" PRId16 " is a "
               " negative number of very small magnitude.\n",
               big, big, negative);

        return EXIT_SUCCESS;
}

Macros PRIu64, PRIx64, and PRId16 are like a regular u, x, or d, except correctly sized for 64-bit, 64-bit, and 16-bit integers respectively.