Idioms for C programmers
This document collects some idiomatic examples of the
C way of doing things. None of these
examples have been tested.
Note that comparing equal strings costs time proportional to the length of the string. For constant-time comparisons, use Hanson's idiom described below.
You useUsing
For a list we'd like to write
Here's the state:
If
The
Idioms for
A
Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.Handling
Suppose we are using
( ,
remember that
Example:
This idiom also enables many scurvy tricks with the C preprocessor.
Reading from standard input or from one or more files named on the command line
The idea is to separate out the processing of an open file handle from the process of finding one or more open file handles. Idiom adapted from Kernighan and Ritchie, page 162, for a program without options:#include <stdlib.h>
#include <stdio.h>
extern void do_something(FILE *fp);
int main(int argc, char *argv[])
{
if (argc == 1) {
do_something(stdin);
} else {
for (int i = 1; i < argc; i++) {
FILE *fp = fopen(argv[i], "r");
if (fp == NULL) {
fprintf(stderr,
"%s: %s %s %s\n",
argv[0], "Could not open file",
argv[i], "for reading");
exit(1);
}
do_something(fp);
fclose(fp);
}
}
return EXIT_SUCCESS;
}
Here's another approach:
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
extern void do_something(FILE *fp);
static FILE *open_or_abort(char *fname, char *mode);
int main(int argc, char *argv[])
{
if (argc == 1) {
do_something(stdin);
} else {
for (int i = 1; i < argc; i++) {
FILE *fp = open_or_abort(argv[i], "r");
do_something(fp);
fclose(fp);
}
}
return EXIT_SUCCESS;
}
static FILE *open_or_abort(char *fname, char *mode)
{
FILE *fp = fopen(fname, mode);
if (fp == NULL) {
int rc = errno;
fprintf(stderr,
"Could not open file %s with mode %s\n",
fname,
mode);
exit(rc);
}
return fp;
}
Idiom for OS-specific error message on failing to open a file:
perror(argv[i]); /* print filename and message about *why* fopen() failed */Idiom for elaborate error messsage:
#include <errno.h>
#include <string.h>
...
fprintf(stderr, "%s: could not open %s (%s)\n",
argv[0], argv[i], strerror(errno));
Getting rid of "Unused variable" warnings
Sometimes a contract insists that a function have certain arguments, but in some implementation you may not use the arguments. It may be that you don't useargc
and argv
in main
, for example. CS 40 does not permit you to
have compile time warnings, so what can you do? This:
int main(int argc, char *argv[])
{
(void) argc;
(void) argv;
...
}
Comparing strings
Gotcha alert! The C++ strings you are used to do not exist in C. In C, you simulate a string by achar *
pointer, which points to a
sequence of bytes ending in '\0'
. This style causes all sorts of
problems, most notably
const char *s1, *s2; /* strings in the neighborhood */
⋮
if (s1 == s2) { /* SILENTLY GIVES WRONG ANSWERS */
...
/* go here only if *pointers* are identical */
...
}
The standard idiom for comparing strings is
if (strcmp(s1, s2) == 0) {
...
/* strings are equal here */
...
}
Old-fashioned C programmers often write (but you should not)
if (!strcmp(s1, s2)) {
...
/* strings are equal here */
...
}
The exclamation mark is easy to overlook, and the uncertain
relationship between booleans, success/failure codes, and other finite
conditions makes such code less clear. If the return value of a
function is intended as a boolean, then omit an explicit test;
otherwise, put in an explicit test.Note that comparing equal strings costs time proportional to the length of the string. For constant-time comparisons, use Hanson's idiom described below.
Printing strings separated by commas
This idiom is notable for its simple control flow. It can be generalized to other separators besides commas and other things to print besides strings.#include <stdio.h>
...
const char *prefix = "";
for (int i = 0; i < nthings; i++) {
printf("%s%s", prefix, things[i]);
prefix = ", ";
}
Notice that to reuse this code, you'll have to
reset prefix
. If this code is in a function, it will get
reset.
Hanson's idiom for constant-time string comparison
Hanson'sAtom_new
or Atom_string
functions
use a shared hash table to ensure that equal strings are
represented by identical pointers. A
single Atom_new
is more expensive than a
single strcmp
, but when you are using strings in data
structures, you will recover the cost by saving comparisons down the
road. You may also save memory. One idiom is
const char *s1, *s2; /* strings in the neighborhood */
s1 = Atom_string(s1); /* hash strings to unique pointers */
s2 = Atom_string(s2);
...
if (s1 == s2) {
...
/* strings are guaranteed equal */
...
}
The behavior is neatly expressed mathematically:Atom_string(s1) == Atom_string(s2)
if and only
if strcmp(s1, s2) == 0
.You use
Atom_new
if you want to create atomic strings
that contain zero bytes (i. e., '\0'
, aka the NUL
character), as you might in some binary network protocols.
Using List_map
to print strings separated by newlines
For a list we'd like to write
const char *prefix = "";
foreach name in list { /* the iteration abstraction does not exist in C */
printf("%s%s", prefix, name);
prefix = "\n";
}
Using the List_map
interface, name
will be
pointed to by a parameter, and prefix
will have to be
stored in the "closure" state which persists across
iterations.Here's the state:
/* Loop state for print_list() function */
struct print_list_state {
const char *prefix;
};
And here's the apply function:
/*
* Prints the elements of a list separated by new lines
* Intended to be passed to List_map()
* Need to reset prefix in loop state
*/
static void print_list_apply(void **list_node, void *loop_state_closure)
{
struct print_list_state *state = loop_state_closure;
char *name = *list_node;
/*
* NB: list_node is &(list_node->first),
* so *list_node is list_node->first
*/
/* First element isn't preceeded by newline; the rest are */
printf("%s%s", state->prefix, name);
cl->prefix = "\n";
}
Now we rewrite the code to assign the initial state and then call
List_map
in place of the loop:
struct print_list_state s;
s.prefix = "";
List_map(list, print_list_apply, &s);
Again, the prefix is still a newline after the call
to List_map
. A good idea is to write a print function
that encapsulates the state structure and the call
to List_map
.
More detail on the indirections:
The indirections look like this:If
lst->first
points to the sequence of characters
"hello"
&(lst->first)
points to lst->first
The
void **list_node
that is passed to print_list_apply is
&(lst->first)
*list_node
has type void *
and
is lst->first
, so we can assign it
to name
without a cast.
Idioms for void **
pointers
A void **
pointer almost always means "pass by
reference" a pointer to an unknown type. Here are a couple of
idioms:
-
To produce a value of type
void **
always use&p
, wherep
is a pointer of typevoid *
. -
To consume or observe a value of type
void **
, first you have to know what the unknown type is. For sake of argument let us assume that the unknown type is 'struct date'. Then you dereference thevoid **
pointer and put the result in a pointer of correct type:
No cast is needed, becausestatic struct date *d; void set_d_by_reference(void **ref) { d = *ref; }
*ref
has typevoid *
and so can be assigned to any pointer variable.
Type abbreviations for structure types
Idiom #1: Hanson style (the type abbreviation includes a pointer):typedef struct Foo *Foo;
...
Foo f;
Idiom #2: Bell Labs style (the type
abbreviation does not include a pointer):
typedef struct foo foo;
...
foo *f;
Poison: mixing the two styles!Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.
Using an abstraction defined in an interface Foo
Many abstractions require memory allocated dynamically on the heap. To use such abstractions correctly, without leaking memory, you must balance every allocation with a free. Here is a recipe:Foo_T foo = Foo_new(...arguments...);
assert(foo != NULL);
...
/* operations on foo, including calling functions that use foo */
...
Foo_free(&foo); /* free *foo and set foo to NULL */
Handling void *
values of known type
Suppose we are using qsort
to sort an array of strings,
where each string is represented by a char *
pointer. No
sane person wants to deal with typecasting, so we write the comparison
function this way:
static int compare(const void *p1, const void *p2)
{
const char * const *ps1 = p1;
const char * const *ps2 = p2;
return strcmp(*ps1, *ps2);
}
The idiom is that if you have a pointer-to-void (void *
)
value of known type, you immediately assign it to a variable of that
type. No explicit cast is needed. Here's another example:
struct node *p = Table_get(t, Atom_string("root"));
Using unboxed arrays
Suppose arraya
is an unboxed UArray_T
containing values of type
struct pixel
. Here's how you get a pointer to
the i
th element:struct pixel *p = UArray_at(a, i); /* capture pointer into the array
* (valid until resized or freed)
*/
assert(sizeof(*p) == UArray_size(a));
... *p ... /* use expression of struct type */
This idiom is robust against changes in type. Why? It has a
single
point of truth, that is, the type is mentioned in only one
place. In Hanson's book, you will see types mentioned in multiple
places, and it is easier to write inconsistent code.
Initializing array elements
Here's an idiom for initializing elementi
of an unboxed
array with an empty Set_T
:
Set_T *setp = UArray_at(array, i);
assert(sizeof(*setp) == UArray_size(array));
*setp = Set_new(10, NULL, NULL);
Storing values into an unboxed array
Suppose functionf()
returns a value of type struct
pixel
you want to store in an unboxed array. Here's how you do
it:
struct pixel *p = UArray_at(a, i); /* capture pointer into the array * (valid until resized or freed) */ assert(UArray_size(a) == sizeof(*p)); /* detects some errors */ *p = f();
Example of array of arrays
Here I initialize and use an array of arrays. Suppose you want aUArray_T
of UArray_T
of double
:
/* n elements of type UArray_T */
UArray_T outer = UArray_new(n, sizeof(UArray_T));
for (i = 0; i < n; i++) {
UArray_T inner_array = UArray_new(length_of_row(i),
sizeof(double));
/* variable number of elements in each inner array */
for (j = 0; j < UArray_length(inner_array), j++) {
double *elemp = UArray_at(inner_array, j);
*elemp = 0.0; /* initial value of element */
}
/* point to slot for inner array */
UArray_T *innerp = UArray_at(outer, i);
*innerp = inner_array;
}
Now to access element row
, col
)UArray_at
returns a
pointer to an element:
UArray_T *p = UArray_at(outer, row);
assert(sizeof(*p) == UArray_size(outer));
UArray_T inner_array = *p;
double *q = UArray_at(inner_array, col);
assert(sizeof(*q)) == UArray_size(inner_array);
return *q;
Allocating memory
The following anti-idiom, although seen frequently in the code of those who have not been taught better, is anathema to C programmers:Thing_T p = malloc(sizeof(struct Thing_T)); /* not acceptable */
assert(p != NULL);
Such code is not acceptable for CS 40:
-
It's easy to leave out the
struct
, in which case you have a memory error.valgrind
will probably catch it, but it shouldn't happen in the first place. -
There is no single point of truth about what the
type of
p
is. In particular, suppose the program evolves to this:NewThing_T p = malloc(sizeof(struct Thing_T)); /* actually wrong */ assert(p != NULL);
Thing_T p = malloc(sizeof(*p)); /* established C idiom */
assert(p != NULL);
N
This code is good because:
-
There is a single point of truth about the type of
p
. If that type changes, the code adjusts automatically and is still correct. -
If the name of
p
changes, you have at least a fighting chance of getting a decent error message from the compiler.
Thing_T p;
NEW(p); // the assertion is included in NEW
Writing long string literals
It can be difficult to cram a longprintf
or fprintf
call into 80 columns. Exploit this unusual
property of C: adjacent string literals are concatenated at
compile time.Example:
fprintf(stderr, "%s: Things have gone horribly wrong: "
"%s is a file format I don't recognize, "
"I can't find any bytes on standard input, "
"and the dog ate %d pages of my homework!\n",
argv[0], argv[1], n - 1);
Note especially
there are no commas between the literals.This idiom also enables many scurvy tricks with the C preprocessor.
Printing integers of known width
The designers of C, unlike the designers of Java, decided that programmers did not need to know how many bits are in a type likeint
or unsigned long
. As a result, it
is nearly impossible to write code that is portable against changes in
the size of a machine word.
This problem was fixed in C99 with the introduction of the
<inttypes.h>
interface, which contains a variety of
integer types and some print macros. Most of you will need it only
for printing. Here are some examples of the C99 idiom for printing
integers of known sizes:
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main()
{
uint64_t big = (uint64_t)1 << 63;
int16_t negative = ~(int16_t)0;
printf("%" PRIu64 " is a large number, as we can see by its"
" hex\nrepresentation 0x%016" PRIx64 ".\n%" PRId16 " is a "
" negative number of very small magnitude.\n",
big, big, negative);
return EXIT_SUCCESS;
}
Macros PRIu64
, PRIx64
,
and PRId16
are like a
regular u
, x
, or d
, except
correctly sized for 64-bit, 64-bit, and 16-bit integers respectively.