COMP40 Coding Standards

Why coding standards are important

Computer code is read by people as well as by machines, and the best programmers take pride in producing code that is appealing to look at and easy to understand. There are many benefits to writing code in a clear, consistent manner, and to ensuring that the layout of your source code reflects its underlying logical structure. These benefits include:
  • Errors are easier for you to spot when you names things consistently, indent consistently, etc.
  • Sooner or later your code will probably be used or maintained by others: they will be delighted if at first glance it's obvious how your code is arranged, and where to look for what.
  • Writing good clear comments helps you think clearly about your code
  • If your naming, punctuation, and indentation is consistent, then automated tools can more easily search and manipulate your code. For example, in the COMP 40 coding standards, an open curly brace { alone on a line almost surely marks the start of a function definition. Editor macros and other tools can search for that.
  • In COMP 40 in particular, the grader reading your code will need to rapidly understand what you are trying to do, and how you did it.
Of course, all the sound principles of abstraction and modularity apply as well!

Your structure and organization grade will suffer substantially if your code doesn't observe the standards set out here. So, read all of this material carefully. If you have any doubts, ask. (I'm sorry to be so strict, but experience shows that most students simply disregard the requirement unless there is a serious grading implication.)

Requirements for COMP40 code

All COMP40 code must strictly observe these requirements:

Language version and compiler switches

Code you submit must be accepted by gcc without errors or warnings. COMP 40 uses ANSI C-99, so you will use these options:
  gcc -std=c99 -pedantic -Wall -Wextra -Werror
You must not use library routines that are non-standard extensions provided by GNU. Many but not all such extensions require you to #define _GNU_SOURCE, which you must not do. Also: for the intro homework assignment, you MUST NOT use or consult the man pages or documentation for the library routines getline or getdelim.

No runtime faults

A program that dumps core, e.g., due to a "segmentation fault" earns No Credit for functional correctness, unless the assignment specifically allows for such failures. Certain assignments allow for assertion failures or, to use Hanson's term, "Checked Runtime Errors" (CREs); the default handler for such exceptions does dump core. There is no deduction in grade for such core dumps if the assignment allows for such assertions or exceptions.

Valgrind

To earn a grade of Very Good or better, code must run under valgrind without leaks or errors. There is one exception to this policy: in situations where the assignment allows termination with a checked runtime exception and where your program actually raises such an exception, memory leaks will not reduce your grade. Note: you may find it helpful when checking for leaks to run valgrind with the following options:
      valgrind --leak-check=full --show-reachable=yes <your_program>
To earn a grade of Good or better, code must run under valgrind without errors.

Source code formatting

For this course, we will adopt a variant of the Linux kernel coding style. These are the rules that govern code that is written for the Linux kernel. Many of you will find that the brace placement and/or the indentation will take time to get used to. That's ok (I had to switch, too). Here are the formatting rules:
  • Your code must not wrap when displayed in 80 columns.
  • Your code must not contain tab characters. This is different from Linus's rules, but relieves us of the difficulty of having everyone set up their tabs the same. To see how to set up emacs or vim to use spaces rather than tab, see the notes on using your text editor for fast compiles.
  • Open curly braces at the start of a function go on a line by themselves, but other open curly braces go at the end of the line that opens the block. Yes, it's inconsistent. Follow Linus's conventions!
  • Curly braces should be used for all conditional statements and loops, even if their body is a single line. This is different from Linus's rules, but protects you from errors that can occur when you change your code later.
  • Put spaces around infix operators, such as =, ==, +, *, etc. For example x = NULL; is right, and x=NULL; is wrong.
  • Put a space after every comma in an argument list and after every semi-colon in a for statement.
  • Put a space between the keywords if, for, while and the following parenthesis. They aren't functions, so don't write them like function calls.
  • Separate curly braces from adjacent, non-blank characters. That is, don't write if (test){. Write if (test) {.
I would like to use my eyes for a few more decades, and you'll find that many of these little things help a lot at 4am. (You would follow these rules if you were writing a paper, and a program is a paper in a formal language written for humans to read.)

More on indentation: do not yield to temptation

Spreading things out and using large indentations make it difficult to keep to the 80 character line width, and you will be tempted to do bad things. Do not violate the indentation rules in order to keep a bunch of code on one line. Breaking lines at appropriate places is good (as long as proper indentation is used). Do not squeeze space out of expressions (e.g., do not remove the spaces around operators, after commas, etc.). The following is bad:

     int x=2;
   int y=x*a_long_function_name(x*x,atoi(argv[1]),g(x+1,NULL));
 
This is much better:

     int x = 2;
     int y = x * a_long_function_name(x * x,
                                      atoi(argv[1]),
                                      g(x + 1, NULL));
 
You can tell how many arguments this function has even 6 feet from the terminal!

Don't rely on puns

Alas, the Hanson book violates this constantly. But you won't.

The fact that C doesn't have a boolean type and uses integers for tests leads to lots of puns that make code less clear. If a function returns a boolean value (1 for true, 0 for false), then do not do an explicit comparison:

     if (is_numerical_expr(expr)) {
             return evaluate(expr);
     }
 
Don't compare the result of is_numerical_expr(expr) to 1. (And if you want to know whether it's false, you should write !is_numerical_expr(expr).)

If an expression is really an integer expression, do make the comparison explicit. If n is an integer that counts down, write while (n > 0) ... rather than while (n) ... . Write if (strncmp(s, t, n) == 0) ... rather than if (!strncmp(s, t, n)) ....

Using pointer values as booleans is an even more obscure multi-layer pun! (The C standard does not require that the null pointer have the value 0: it just requires that when you write 0 in a pointer context, it's interpreted as the null pointer. Never write 0 for the null pointer, by the way. Write NULL.) Write assert(p != NULL); rather than what you see in the book (assert(p)).

Comments

Though gcc allows other forms, you must use only /* ... */ style comments. Do not use //. In your programs, most single-line comments should look like this:
   /* I am an acceptable single-line comment. */
Very important single-line comments, and most multiple-line comments, should look like this:
   /*
    * I am a multi-line comment.  My indentation should match the indentation
    * of the code that surrounds me.
    */
Sometimes it's useful to set off major sections of code like this:
   /*****************************************************************
    *                  Data formatting functions
    *****************************************************************/
or like this:
   /*----------------------------------------------------------------*
    |
    |                  Data formatting functions
    |
    *----------------------------------------------------------------*/
Note that the above are legal C comments starting with /* and ending with */.

Where a major portion of the code needs explanation or warnings, a box can be a good place to put it.

   /*****************************************************************
    *                  
    *                          login
    *                  
    *   Called when a user logs on.  This code is a little tricky, 
    *   because the new user structure we're creating may
    *   at the same time be accessed by other threads.
    *   Be sure to lock all accesses to structure members.
    *                  
    *****************************************************************/

Do not put big clunky boxes like this ahead of every function or all over your code. Use a hierarchy of commenting styles to make it easy for a reader to navigate the code. Boxes can be useful to set off major sections (though you should be suspicious: if code is long enough to require much of this, ask if it should be broken into multiple source files). Use the smaller style comments for individual functions, ahead of loops (if they require commenting), etc. Use "on the line" comments in cases where an individual line needs explanation. This can be particularly useful for variable declarations and initializations.

Sign your code

Begin every file you write with a comment block containing at least the following information:

/*
 *     filename
 *     by name(s), date
 *     assignment
 *
 *     summary
 */ 
...or if it's more appropriate to the style of the rest of your code, put this information in a box that begins the code, e.g.:

/**************************************************************
 *
 *                     filename
 *
 *     Assignment: assignment
 *     Authors:  name(s), 
 *     Date:     date
 *
 *     summary
 *
 *     ...you may provide more information here about 
 *        the program or file, it's interfaces, etc. here
 *
 **************************************************************/
In the above, filename is the name of the file, name(s) gives the name(s) of the file's author(s), date is the completion date, assignment tells which homework or project assignment this is for, and summary is a brief description of what the code in the file is for and hints about how it works (if that isn't obvious). This should include the relationship to other files where appropriate. For example, a C file that defines a resource (e.g., a hash table implementation) might give the name of the include file(s) that clients should use. The block may also include a change log for the benefit of the author(s).

The exact format of this header doesn't matter as long as it is clear and consistent with the formatting of the rest of the source file (e.g. some files may use heavy boxes in one style or another, in which case it's fine to use that format.

Simple code is better

In order to keep code comprehensible, it is very important to minimize the number of things a reader has to keep in their head at any one time. This leads directly to two important coding conventions:
  1. Use more relatively short functions rather than fewer long functions — in other words, use procedural abstraction extensively.
  2. Keep the complexity of functions down. Do this by structuring your code so that the maximum nesting depth is on the order of 3 (it's hard to keep track of 5 pending conditions or the state of 5 nested loops). Another way to keep code simple is to encode complexity in data structures rather than algorithms whenever possible. Extensive documentation of a static structure definition is easier to understand than code with tons of possible execution paths.

Leaving tracks when things aren't right

This is not a requirement for COMP 40, but it's a terrific idea to mark any code that has shortcomings or that needs further attention with a uniform marker in comments. One example is: NEEDSWORK. so, a piece of code might have a comment like this:

   /*****************************************************************
    *                  
    *                     getUserName
    *                  
    *   NEEDWORK: doesn't correctly handle names from Asian
    *             countries in which family name comes ahead of
    *             given name.
    *                  
    *****************************************************************/
...or...

    fopen("myfile","rb"); /* NEEDSWORK: not sure if "rb" is right mode */
Marking problems in this way has a number of advantages:
  • It makes you think about the problem, and may encourage you to fix it
  • A week later, when your code is doing something mysterious, that NEEDSWORK may remind you of something you knew you didn't do right
  • Before final testing or submission, you can grep all of your source code including header files for that NEEDSWORK string. You'll get a list of everything in your program that you thought might not be right, and you can fix everything that needs fixing before release.
The worst thing you can do with a known problem is to lose track of it: almost surely at some later time, and possibly at a very inconvenient time, you will have to reconstruct your knowledge of the problem the hard way. NEEDSWORK markings can help.

You may be wondering: won't I lose credit in COMP 40 if I point out bugs or misunderstandings? On the contrary, typically we will respect the fact that you are aware of the problem, and we may deduct less for a problem when you show us explicitly that you are aware of it.

Use Emacs — it makes it easy

Set Emacs's C Indentation Style to linux. For an individual editing session, you can type
C-c . linux
It's better to set up emacs to do this for all C source code. See using your text editor for fast compiles to find out how. When you are typing your code and you come to the end of a line, type C-j, rather than a return. If you do this, Emacs indents the next line automatically according to the indentation rules. Alternatively, hitting Tab will indent the current line according to the current rules. If you have edited a function and ruined the indentation (or you fear you may have brace or parenthesis problems), you can indent an entire block by placing the curser over the opening curly brace of the block and typing M-C-q.

Some more thoughts on good code and documentation

In your comments and other documentation, find a way to explain your work that is appropriate to the size of the problem.
  • For a trivial solution of a few lines of code, it may be best to have no explanation at all. (We're not entering the Obfuscated C Contest!)
  • For a solution of a dozen lines of code, a sentence or two is usually sufficient.
  • You should avoid writing tricky code whenever possible, but sometimes the best solution is subtle. In these rare cases, it may be appropriate to have an extensive comment even over a line or two of code, warning readers of whatever may not be obvious.
For these kinds of small problems, the best method of explanation is almost always comments in the source code.
  • For a solution of a hundred lines or more, I would expect several paragraphs of explanation. The explanation should cover not so much the code itself, but the organization of the code and the plan that produced it.
  • For larger programs, especially programs that are divided into multiple files, it is appropriate to go through a more thorough design process, some elements of which should be used to explain your work.
For these larger problems, you must describe your thinking at least as well as your code. For COMP 40, these kinds of long explanation should typically be in the README file, not in the code itself. It is often appropriate to include brief comments about the overall architecture of a module in the source code, especially if that structure is not immediately obvious from the content and layout of the source itself.

Modularity and good documentation are complementary

Good programming style and documentation are essential ingredients in a clear and readable assignment. Good style includes the use of functions, modules, and abstract data types where appropriate.
  • Document every representation.
  • For all but the smallest problems, divide your own code into files, using the interfaces and implementations style. Use the slogan "every module hides a secret".
  • Choose appropriate names for functions and variables:
    • Exported names should be long enough to be descriptive, like Seg_addhi or Segment_grow.
    • Private names, including the names of parameters, can be shorter, but they should still be descriptive, like add.
    • Often it's appropriate to use a "p" or "_p" suffix on pointer variables, e.g. student_p for a variable of type struct student *.
    • For short-lived temporaries, use C's conventions for one-letter names: i for index, n for integer, s for string, p for pointer, and so on.

Use system library functions

Good style normally includes using library functions where possible, instead of duplicating effort by re-implementing something yourself. However, some libraries have such complex interfaces that you may be justified in avoiding them; when in doubt, ask the course staff.

Use good algorithms

Choose algorithms appropriate to the task at hand, and make your code structured and easy to read. Good use and placement of documentation is vital. Lots of comment text does not necessarily mean good documentation.

Last but not least...

If you're still reading this, you may be interested in checking out the Linux Kernel Coding Standards, which lay out the standards for modifications made to the open source operating system, Linux.