Comp 40:
Coding Standards

Why coding standards are important

Computer code is read by people as well as by machines, and the best programmers take pride in producing code that is appealing to look at and easy to understand. There are many benefits to writing code in a clear, consistent manner, and to ensuring that the layout of your source code reflects its underlying logical structure. These benefits include:

Of course, all the sound principles of abstraction and modularity apply as well!

Your structure and organization grade will suffer substantially if your code doesn't observe the standards set out here. So, read all of this material (including Linus's rules) carefully. If you have any doubts, ask. (I'm sorry to be so strict, but experience shows that most students simply disregard the requirement unless there is a serious grading implication.)

Requirements for COMP 40 code

All COMP 40 code must strictly observe these requirements:

Language version and compiler switches

Code you submit must be accepted by the appropriate compiler without errors or warnings. If you are using gcc, use these options:

  gcc -std=c99 -pedantic -Wall -Wextra -Werror

Comp 40 uses ANSI C-99. However, with a few exceptions that we'll address as time goes on, stick with the more restrictive C-90 standard.

  1. Though gcc allows other forms, you must use only /* ... */ style comments. Do not use //. See the section on comments below for more on comments.
  2. Local variables should be introduced at the start of a block. That means right after the function header (at the start of the function body), or at the start of a group of statements inside curly braces. This one can be very challenging for Java/C++ programmers to get used to! This has you document what things will be stored in a block and also corresponds more closely to what happens in the assembly/machine code.

You must not use library routines that are non-standard extensions provided by GNU. Many but not all such extensions require you to #define _GNU_SOURCE, which you must not do. Also: for the intro homework assignment, you MUST NOT use or consult the man pages or documentation for the library routines getline or getdelim. FYI: in earlier years these were prohibited by the above rule that prohibits use of GNU extensions to C or the libraries. They were recently standardized, but for the intro assignment you must not use them.

No runtime faults

A program that dumps core, e.g. due to a "segmentation fault" earns No Credit for functional correctness, unless the assignment specifically allows for such failures. Certain assignments allow for assertion failures or, to use Hanson's term, "Checked Runtime Errors" (CREs); the default handler for such exceptions does dump core. There is no deduction in grade for such core dumps if the assignment allows for such assertions or exceptions.

Valgrind

To earn a grade of Very Good or better, code must run under valgrind without leaks or errors. There is one exception (pun intended) to this policy: in situations where the assignment allows termination with a checked runtime exception and where your program actually raises such an exception, memory leaks will not reduce your grade. Note: you may find it helpful when checking for leaks to run valgrind with the following options:

      valgrind --leak-check=full --show-reachable=yes <your_program>

To earn a grade of Good or better, code must run under valgrind without errors.

Source code formatting

For this course, we will adopt the Linux kernel coding style. These are the rules that govern code that is written for the Linux kernel. Many of you will find the brace placement and/or the indentation will take time to get used to. That's ok (I had to switch, too). Remember, open curly braces at the start of a function go on a line by themselves, but other open curly braces go at the end of the line that opens the block. Yes, it's inconsistent. Follow Linus's conventions!

Your code must not wrap when displayed in 80 columns.

Your code must not contain tab characters. This is different from Linus's rules, but relieves us of the difficulty of having everyone set up their tabs the same. To see how to set up emacs to use spaces rather than tab, see the notes on using your text editor for fast compiles.

Spread things out. Specifically:

I would like to use my eyes for a few more decades, and you'll find these little things help a lot at 4am. (You would follow these rules if you were writing a paper, and a program is a paper in a formal language written for humans to read.)

More on indentation: do not yield to temptation

Spreading things out and using large indentations make it difficult to keep to the 80 character line width, and you will be tempted to do bad things. For example, a function body with two nested if if statements will have code indented 24 characters.

Do not violate the indentation rules in order to keep a bunch of code on one line. Breaking lines at appropriate places is good (as long as proper indentation is used). Do not squeeze space out of expressions (e.g., do not remove the spaces around operators, after commas, etc.). The following is bad:

     int x=2;
   int y=x*a_long_function_name(x*x,atoi(argv[1]),g(x+1,NULL));
 
This is much better:
     int x = 2;
     int y = x * a_long_function_name(x * x,
                                      atoi(argv[1]),
                                      g(x + 1, NULL));
 
You can tell how many arguments this function has even 6 feet from the terminal!

[By the way, never use atoi().]

Comments

In C, most single-line comments should look like this:
   /* I am an acceptable single-line comment. */
Very important single-line comments, and most multiple-line comments, should look like this:
   /*
    * I am a multi-line comment.  My indentation should match the indentation
    * of the code that surrounds me.
    */
Sometimes it's useful to set off major sections of code like this:

   /*****************************************************************
    *                  Data formatting functions
    *****************************************************************/

or like this:

   /*----------------------------------------------------------------*
    |
    |                  Data formatting functions
    |
    *----------------------------------------------------------------*/

Note that the above are legal C comments starting with /* and ending with */. Where a major portion of the code needs explanation or warnings, a box can be a good place to put it.

   /*****************************************************************
    *                  
    *                          login
    *                  
    *   Called when a user logs on.  This code is a little tricky, 
    *   because the new user structure we're creating may
    *   at the same time be accessed by other threads.
    *   Be sure to lock all accesses to structure members.
    *                  
    *****************************************************************/

Do not put big clunky boxes like this ahead of every function or all over your code. Use a hierarchy of commenting styles to make it easy for a reader to navigate the code. Boxes can be useful to set off major sections (though you should be suspicious: if code is long enough to require much of this, ask if it should be broken into multiple source files). Use the less intrusive style like this:

   /*
    * I am a multi-line comment.  My indentation should match the indentation
    * of the code that surrounds me.
    */

for individual functions, ahead of loops (if they require commenting), etc. Use "on the line" comments in cases where an individual line needs explanation. This can be particularly useful for variable declarations and initializations.

Sign your code

Begin every file you write with a comment block like this:

/*
 *     filename
 *     by name(s), date
 *     assignment
 *
 *     summary
 */ 
where filename is the name of the file, name(s) gives the name(s) of the file's author(s), date is the completion date, assignment tells which homework or project assignment this is for, and summary is a brief description of what the code in the file is for and hints about how it works (if that isn't obvious). This should include the relationship to other files where appropriate. For example, a C file that defines a resource (e.g., a hash table implementation) might give the name of the include file(s) that clients should use. The block may also include a change log for the benefit of the author(s). The exact format of this header doesn't matter as long as it is clear and consistent with the formatting of the rest of the source file (e.g. some files may use heavy boxes in one style or another, in which case it's fine to use that format.

Simple code is better

In order to keep code comprehensible, it is very important to minimize the number of things a reader has to keep in her head at any one time. This leads directly to two important coding conventions:

  1. Use more relatively short functions rather than fewer long functions — in other words, use procedural abstraction extensively.
  2. Keep the complexity of functions down. Do this by structuring your code so that the maximum nesting depth is on the order of 3 (it's hard to keep track of 5 pending conditions or the state of 5 nested loops). Another way to keep code simple is to encode complexity in data structures rather than algorithms whenever possible. Extensive documentation of a static structure definition is easier to understand than code with tons of possible execution paths.

Leaving tracks when things aren't right

This is not a requirement for COMP 40, but in years of work on projects large and small, I have found this trick to be incredibly useful. I actually picked it up working on the kernel of the LOCUS distributed UNIX system many years ago.

In principle, all of us write only perfect code :-). If there's anything that isn't right, and especially anything we're not sure will work right, we fix it right away.

Of course in practice, this is far from true. We all occasionally compile, test and sometimes even (after carefully considering the costs and benefits!) release code that has known shortcomings. We also sometimes experiment with code about which we are suspicious: maybe we understood how to use that second parameter on the fopen system call properly, or maybe we're just hoping we've got it right and intend to test it later.

In all such cases, it's a terrific idea to mark any code that has shortcomings or that needs further attention with a uniform marker in comments. The one I use is: NEEDSWORK. so, a piece of code might have a comment like this:

   /*****************************************************************
    *                  
    *                     getUserName
    *                  
    *   NEEDWORK: doesn't correctly handle names from Asian
    *             countries in which family name comes ahead of
    *             given name.
    *                  
    *****************************************************************/

...or...

 
    fopen("myfile","rb"); /* NEEDSWORK: not sure if "rb" is right mode */

Marking problems in this way has a number of advantages:

The worst thing you can do with a known problem is to lose track of it: almost surely at some later time, and possibly at a very inconvenient time, you will have to reconstruct your knowledge of the problem the hard way. NEEDSWORK markings can help.

You may be wondering: won't I lose credit in COMP 40 if I point out bugs or misunderstandings? On the contrary, typically we will respect the fact that you are aware of the problem. Now, you can't avoid a low grade merely by putting NEEDSWORK next to every piece of terrible code you submit, but in general we will deduct less for a problem when you show us explicitly that you are aware of it.

BTW: "rb" on fopen means "read in binary"; certain line ending conversions that some UNIX systems perform on text files will not be performed. In fact, the "b" is ignored on Linux and many other modern UNIX systems.

Don't rely on puns

Alas, the Hanson book violates this constantly. But you won't.

The fact that C doesn't have a boolean type and uses integers for tests leads to lots of puns that make code less clear. If a function returns a boolean value (1 for true, 0 for false), then do not do an explicit comparison:

     if (is_numerical_expr(expr))
             return evaluate(expr);
 
Don't compare the result of is_numerical_expr(expr) to 1. (And if you want to know whether it's false, you should write !is_numerical_expr(expr).)

If an expression is really an integer expression, do make the comparison explicit. If n is an integer that counts down, write while (n > 0) ... rather than while (n) ... . Write if (strncmp(s, t, n) == 0) ... rather than if (!strncmp(s, t, n)) ....

Using pointer values as booleans is an even more obscure multi-layer pun! (The C standard does not require that the null pointer have the value 0: it just requires that when you write 0 in a pointer context, it's interpreted as the null pointer. Never write 0 for the null pointer, by the way. Write NULL.)

Write assert(p != NULL); rather than what you see in the book (assert(p)).

Use Emacs — it makes it easy

Set Emacs's C Indentation Style to linux. For an individual editing session, you can type

C-c . linux

It's better to set up emacs to do this for all C source code. See using your text editor for fast compiles to find out how.

When you are typing your code and you come to the end of a line, type C-j, rather than a return. If you do this, Emacs indents the next line automatically according to the indentation rules. Alternatively, hitting Tab will indent the current line according to the current rules. If you have edited a function and ruined the indentation (or you fear you may have brace or parenthesis problems), you can indent an entire block by placing the curser over the opening curly brace of the block and typing M-C-q.

Some more thoughts on good code and documentation

In your comments and other documentation, find a way to explain your work that is appropriate to the size of the problem. For these kinds of small problems, the best method of explanation is almost always comments in the source code. For these larger problems, you must describe your thinking at least as well as your code. For COMP 40, these kinds of long explanation should typically be in the README file, not in the code itself. It is often appropriate to include brief comments about the overall architecture of a module in the source code, especially if that structure is not immediately obvious from the content and layout of the source itself.

Modularity and good documentation are complementary

Good programming style and documentation are essential ingredients in a clear and readable assignment. Good style includes the use of functions, modules, and abstract data types where appropriate.

Use system library functions

Good style normally includes using library functions where possible, instead of duplicating effort by re-implementing something yourself. However, some libraries have such complex interfaces that you may be justified in avoiding them; when in doubt, ask the course staff.

Use good algorithms

Choose algorithms appropriate to the task at hand, and make your code structured and easy to read. Good use and placement of documentation is vital. Lots of comment text does not necessarily mean good documentation.

More on documentation

For earlier versions of COMP 40, Prof. Norman Ramsey prepared a write up giving some of his thoughts on good documentation. Occasionally, you will find minor differences from the advice given above, but overall it has many good suggestions that you will find useful in your work in COMP 40. It's available at How to Write Documentation for COMP 40.
Author: Noah Mendelsohn (based on earlier versions by Mark A. Sheldon & Norman Ramsey)
Last Modified: 24 January 2016