This document collects some idiomatic examples of the C way of doing things. None of these examples have been tested. Please report errors or difficulties to comp40-staff
.
The idea is to separate out the processing of an open file handle from the process of finding one or more open file handles. Idiom adapted from Kernighan and Ritchie, page 162, for a program without options:
extern void do_something(FILE *);
int main(int argc, char *argv[]) {
if (argc == 1) {
do_something(stdin);
} else {
for (int i = 1; i < argc; i++) {
FILE *fp = fopen(argv[i], "r");
if (fp == NULL) {
fprintf(stderr, "%s: Could not open file %s for reading\n",
argv[0], argv[i]);
exit(1);
}
do_something(fp);
fclose(fp);
}
}
}
Idiom for OS-specific error message on failing to open a file:
perror(argv[i]); // print the filename with message about *why* fopen() failed
Idiom for elaborate error messsage:
...
fprintf(stderr, "%s: could not open %s (%s)\n",
argv[0], argv[i], strerror(errno));
The correct idiom for reading input is to
Allocate a buffer
Call fgets
Allocation may be static or dynamic. The main issue is how to recover if fgets
does not return an entire line. Assuming you can't just halt the program with an error message, these are your options:
If your buffer is dynamically allocated, you can enlarge it and continue to read.
If your buffer is statically allocated, you should
Here are some things to avoid when doing input one line at a time:
Never use gets
; it's unsafe.
Don't use scanf
, especially not for interactive programs. It's
too easy for scanf
to become greedy and gobble up more than one
line, especially if the input doesn't meet specifications.
If you have the urge to use the scanf
interface, which can be
quite useful, use fgets
to read the line and then sscanf
(note
the extra 's') to read the pieces.
Gotcha alert! The C++ strings you are used to do not exist in C. In C, you simulate a string by a char *
pointer, which points to a sequence of bytes ending in '\0'
. This style causes all sorts of problems, most notably
const char *s1, *s2; // strings in the neighborhood
if (s1 == s2) { // SILENTLY GIVES WRONG ANSWERS
... // go here only if *pointers* are identical
}
The standard idiom for comparing strings is
if (strcmp(s1, s2) == 0) {
... // strings are equal here
}
The seasoned C programmer often writes
if (!strcmp(s1, s2)) {
... // strings are equal here
}
The exclamation mark is easy to overlook.
Problem: comparing equal strings costs time proportional to the length of the string.
Hanson's Atom_new
or Atom_string
functions use a shared hash table to ensure that equal strings are represented by identical pointers. A single Atom_new
is more expensive than a single strcmp
, but when you are using strings in data structures, you will recover the cost by saving comparisons down the road. You may also save memory. One idiom is
const char *s1, *s2; // strings in the neighborhood
s1 = Atom_string(s1); // hash string to a unique pointer
s2 = Atom_string(s2); // likewise
...
if (s1 == s2) {
... // strings are guaranteed equal
}
The behavior is neatly expressed mathematically:
Atom_string(s1) == Atom_string(s2)
if and only ifstrcmp(s1, s2) == 0
.
You use Atom_new
if you want to create atomic strings that contain zeroes, as you might in some binary network protocols.
This idiom is notable for its simple control flow. It can be generalized to other separators besides commas and other things to print besides strings.
...
const char *prefix = "";
for (int i = 0; i < nthings; i++) {
printf("%s%s", prefix, things[i]);
prefix = ", ";
}
List_map
to print strings separated by commasFor a list we'd like to write
const char *prefix = "";
foreach name in list { // the iteration abstraction does not exist in C
printf("%s%s", prefix, name);
prefix = "\n";
}
Using the List_map
interface, name
will be pointed to by a parameter, and prefix
will have to be stored in the "closure" state which persists across iterations.
Here's the state:
struct inner_state {
const char *prefix;
};
And here's the apply function:
static void inner_apply(void **x, void *cl) {
struct inner_state *s = cl;
char *name = *x; // x is &p->first, so *x is p->first
printf(cl->prefix, name);
}
Now we rewrit the code to assign the initial state and then call List_map
in place of the loop:
struct inner_state s;
s.prefix = "";
List_map(list, inner_apply, &s);
The indirections look like this:
p->first
points to the sequence of characters "hello"
&p->first
points to p->first
the void **x
that is passed to inner_apply is &p->first
*x
has type void *
and is p->first
, so we can assign it to char *name
without a cast
void **
pointersA void **
pointer almost always means "pass by reference a pointer
to an unknown type. Here are a couple of idioms:
To produce a value of type void **
always use &p
, where p
is
a pointer of type void *
.
To consume or observe a value of type void **
, first you have to
know what the unknown type is. For sake of argument let us
assume that the unknown type is 'struct date'. Then what you do
is dereference the void **
pointer and put the result in a
pointer of correct type:
static struct date *d;
void set_d_by_reference(void **ref) {
d = *ref;
}
No cast is needed, because *ref
has type void *
and so can be
assigned to any pointer variable.
Idiom #1: Hanson style (the type abbreviation includes a pointer):
typedef struct foo *Foo;
...
Foo f;
Idiom #2: Bell Labs style (the type abbreviation does not include a pointer):
typedef struct foo foo;
...
foo *f;
Poison: mixing the two styles!
Note: you'll sometimes see capitalized names used with Bell Labs style. I've never seen lowercase names used with Hanson style.
Many abstractions require memory allocated dynamically on the heap. To use such abstractions correctly, without leaking memory, you must balance every allocation with a free. Here is a recipe:
Foo_T foo = Foo_new(...arguments...);
assert(foo != NULL);
... operations on foo, including calling functions that use foo ...
Foo_free(&foo); // free *foo and set foo to NULL
void *
values of known typeSuppose we are using qsort
to sort an array of strings, where each
string is represented by a char *
pointer. No sane person wants to
deal with typecasting, so we write the comparison function this way:
static int compare(const void *p1, const void *p2) {
const char * const *ps1 = p1;
const char * const *ps2 = p2;
return strcmp(*ps1, *ps2);
}
The idiom is that if you have a void *
value of known type, you
immediately assign it to a variable of that type. No explicit cast is
needed. Here's another example:
struct node *p = Table_get(t, Atom_string("root"));
Be alert that I have retired Hanson's Array
abstraction.
I have replaced it with `UArray', an abstraction of unboxed arrays.
(The implementation is exactly what's in the book, but the interface is different.)
Suppose array a
is an unboxed UArray_T
containing values of type struct pixel
.
Here's how you get a pointer to the i
th element:
The Ramsey approach:
struct pixel *p = UArray_at(a, i); // capture pointer into the array
// (valid until resized or freed)
assert(sizeof(*p) == UArray_size(a));
... *p ... // use expression of struct type
This idiom is robust against changes in type. Why? It has a
single point of truth,
that is, the type is mentioned in only one place. In Hanson's book,
you will see types mentioned in multiple places, and it is easier to
write inconsistent
code. That's one reason I've retired Hanson's arrays.
(The other is that students found them hard to learn.)
Here's an idiom for initializing element i
of an array with an empty Set_T
:
Set_T *setp = UArray_at(array, i);
assert(sizeof(*setp) == UArray_size(array));
*setp = Set_new(10, NULL, NULL);
Suppose function f()
returns a value of type struct pixel
you want to store in
an unboxed array.
Here's how you do it:
struct pixel *p = Array_at(a, i); // capture pointer into the array
// (valid until resized or freed)
assert(Array_size(a) == sizeof(*p)); // detects some errors
*p = f();
Here I initialize and use an array of arrays.
Suppose you want a UArray_T
of UArray_T
of double
:
UArray_T outer = UArray_new(n, sizeof(UArray_T)); // n elements of type UArray_T
for (i = 0; i < n; i++) {
UArray_T inner_array = UArray_new(length_of_row(i), sizeof(double));
// variable number of elements in each inner array
for (j = 0; j < UArray_length(inner_array), j++) {
double *elemp = UArray_at(inner_array, j); // point to element
*elemp = 0..0; // initial value of element
}
UArray_T *innerp = UArray_at(outer, i); // point to slot for inner array
*innerp = inner_array;
}
Now to access element (i, j), I remember that UArray_at
returns a pointer to an
element:
UArray_T *p = UArray_at(outer, i);
assert(sizeof(*p) == UArray_size(outer));
UArray_T inner_array = *p;
double *q = UArray_at(inner_array, j);
assert(sizeof(*q)) == UArray_size(inner_array);
return *q;
The following anti-idiom, although seen frequently in the code of those who have not been taught better, is anathema to C programmers:
Thing_T p = malloc(sizeof(struct Thing_T)); // not acceptable
assert(p);
Such code is not acceptable for COMP 40:
It's easy to leave out the struct
, in which case you have a
memory error. valgrind
will probably catch it, but it shouldn't
happen in the first place.
There is no single point of truth about what the type of p
is.
In particular, suppose the program evolves to this:
NewThing_T p = malloc(sizeof(struct Thing_T)); // actually wrong
assert(p);
The correct way to write these allocations is with a single point of truth:
Thing_T p = malloc(sizeof(*p)); // established C idiom
assert(p);
This code is good because
There is a single point of truth about the type of p
.
If that type changes, the code adjusts automatically and is still
correct.
If the name of p
changes, you have at least a fighting chance
of getting a decent error message from the compiler.
This business of allocation and deallocation is so tricky that we recommend you use Hanson's macros:
Thing_T p;
NEW(p); // the assertion is included in NEW
It can be difficult to cram a long printf
or fprintf
call into 80
columns. Exploit this unusual property of C: adjacent string
literals are concatenated at compile time.
Example:
fprintf(stderr, "%s: Things have gone horribly wrong: "
"%s is a file format I don't recognize, "
"I can't find any bytes on standard input, "
"and the dog ate %d pages of my homework!\n",
argv[0], argv[1], n-1);
Note especially there are no commas between the literals.
This idiom also enables many scurvy tricks with the C preprocessor.
The designers of C, unlike the designers of Java, decided that
programmers did not need to know how many bits are in a type
like int
or unsigned long
. As a result, it is nearly impossible
to write code that is portable against changes in the size of a
machine word.
This problem was fixed in C99 with the introduction of the
<inttypes.h>
interface.
Most of you will need it only for printing.
Here are some examples of the C99 idiom for printing integers of known sizes:
int main() {
uint64_t big = (uint64_t)1 << 63;
int16_t negative = ~(int16_t)0;
printf("%" PRIu64 " is a large number, as we can see by its"
" hex\nrepresentation 0x%016" PRIx64 ".\n%" PRId16 " is a "
" negative number of very small magnitude.\n",
big, big, negative);
return 0;
}
Macros PRIu64
, PRIx64
, and PRId16
are like
a regular u
, x
, or d
, except correctly sized for 64-bit, 64-bit,
and 16-bit integers respectively.