CS 40 Coding Standards

Why coding standards are important

Computer programs are read by people as well as by machines. The best programmers take pride in producing code that is appealing to look at and easy to understand. There are many benefits to writing code clearly and consistently and ensuring the layout of your source code reflects its underlying logical structure. These benefits include:

Errors are easier for you to spot when you names things consistently, indent consistently, etc.
Sooner or later your code will probably be used or maintained by others: they will be delighted if at first glance it's obvious how your code is arranged, and where to look for what.
Writing good clear comments helps you think clearly about your code
If your naming, punctuation, and indentation is consistent, then automated tools can more easily search and manipulate your code. For example, in the CS 40 coding standards, an open curly brace { alone on a line almost surely marks the start of a function definition. Editor macros and other tools can search for that.
In CS 40 in particular, the grader reading your code will need to rapidly understand what you are trying to do, and how you did it.

Of course, all the sound principles of abstraction and modularity apply as well!

Your structure and organization grade will suffer substantially if your code doesn't observe the standards set out here. So, read all of this material carefully. If you have any doubts, ask. (I'm sorry to be so strict, but experience shows that most students simply disregard the requirement unless there is a serious grading implication.)

Requirements for CS 40 code

All CS 40 code must strictly observe these requirements:

Language version and compiler switches

Code you submit must be accepted by gcc without errors or warnings. CS 40 uses ANSI C-99, so you will use these options:

  gcc -std=c99 -pedantic -Wall -Wextra -Werror

You must not use library routines that are non-standard extensions provided by GNU. Many but not all such extensions require you to #define _GNU_SOURCE, which you must not do. Also: for the intro homework assignment, you MUST NOT use or consult the man pages or documentation for the library routines getline or getdelim.

No runtime faults

A program that dumps core, e.g., due to a "segmentation fault" earns No Credit for functional correctness, unless the assignment specifically allows for such failures.

Certain assignments allow for assertion failures or, to use Hanson's term, Checked Runtime Errors (CREs); the default handler for such exceptions does dump core. There is no deduction in grade for such core dumps if the assignment allows for such assertions or exceptions.

This implies that you must do thorough error checking! In all systems and library code, there should be no undocumented unchecked runtime errors (that is, the code should never crash exept in ways that are documented). Assignments will say explicitly whether unchecked runtime errors are permitted, and this is rare. Therefore:

All library calls that can produce a failure must be followed by tests for failure. This prominently includes malloc and calloc, which can fail if sufficient memory is unavailable. Similarly, fopen, strtol, and any other built-in function that can fail should have the call followed by checks for success and appropriate error handling.
Thus, it can often be good practice to wrap such calls in your own (appropriately named) functions that validate the results. This is especially useful when failure results in a checked runtime error. Hanson has done this for memory allocation, and you are encouraged to use his mem interface. (See the C idioms page for an example.)
Do not use library functions that can fail with no indication of failure such as atoi (use sscanf, strtol, or similar instead and check for errors).

Valgrind

To earn a grade of Very Good or better, code must run under valgrind without leaks or errors. There are two exceptions to this policy:

Memory leaks attributable to use of Hanson's Atoms will not count against you, because there is no mechanism for freeing Atom memory, and thus freeing it is impossible. (Note: This is not an oversight or a bug in Hanson's Atom facility. All languages with an equivalent data type have this property, because it is essential for the semantics of an Atom, or whatever the relevant language calls it.)
In situations where the assignment allows termination with a checked runtime exception and where your program actually raises such an exception, memory leaks will not reduce your grade, because avoiding these leaks is impossible.

Note: you may find it helpful when checking for leaks to run valgrind with the following options:

      valgrind --leak-check=full --show-reachable=yes <your_program>

To earn a grade of Good or better, code must run under valgrind without errors.

Source code formatting

For this course, we will adopt a variant of the Linux kernel coding style. These are the rules that govern code that is written for the Linux kernel. Many of you will find that the brace placement and/or the indentation will take time to get used to. That's ok (I had to switch, too). Here are the formatting rules:

Your code must not wrap when displayed in 80 columns.
Each level of indentation must be eight (8) characters. Not 4 or 5 or (shudder) 2 spaces.
Your code must not contain tab characters. This is different from Linus's rules, but relieves us of the difficulty of having everyone set up their tabs the same. To see how to set up emacs or vim to use spaces rather than tab, see the notes on using your text editor for fast compiles.
Open curly braces at the start of a function go on a line by themselves, but other open curly braces go at the end of the line that opens the block (with a space before the brace). Yes, it's inconsistent. Follow Linus's conventions!
Curly braces should be used for all conditional statements and loops, even if their body is a single line. This is different from Linus's rules, but protects you from errors that can occur when you change your code later.
Put spaces around infix operators, such as =, ==, +, *, etc. For example x = NULL; is right, and x=NULL; is wrong.
Put a space after every comma in an argument list and after every semi-colon in a for statement.
Put a space between the keywords if, for, while and the following parenthesis. They aren't functions, so don't write them like function calls.
Separate curly braces from adjacent, non-blank characters. That is, don't write if (test){. Write if (test) {.

I would like to use my eyes for a few more decades, and you'll find that many of these little things help a lot at 4 am. (You would follow these rules if you were writing a paper, and a program is a paper in a formal language written for humans to read.)

More on indentation: do not yield to temptation

Spreading things out and using large indentations make it difficult to keep to the 80 character line width, and you will be tempted to do bad things.

Do not violate the indentation rules in order to keep a bunch of code on one line. Breaking lines at appropriate places is good (as long as proper indentation is used). Do not squeeze space out of expressions (e.g., do not remove the spaces around operators, after commas, etc.). The following is bad:

     int x=2;
   int y=x*a_long_function_name(x*x,atoi(argv[1]),g(x+1,NULL));

This is much better:

     int x = 2;
     int y = x * a_long_function_name(x * x,
                                      int_from_string(argv[1]),
                                      g(x + 1, NULL));

You can tell how many arguments this function has even 6 feet from the terminal! It also has the benefit of using a function the auther evidently wrote to validate that argv[1] actually contains an integer.

On the positive side, large indentations, spreading things out, and the 80-column rule have a way of conspiring to keep your functions simpler. Every level of indentation represents something the reader of your code has to keep in their heads, and these rules mean you can't have too many levels of indentation!

Use `stdbool.h`

Use stdbool.h so you can use the type bool and the constants true and false when you are dealing with boolean data.

Don't rely on puns

Alas, the Hanson book violates this constantly. But you won't.

The fact that C doesn't have a native boolean type and uses integers for tests leads to lots of puns that make code less clear.

If you call a function that returns a boolean value (whether it uses the definitions from stdbool.h internally or uses integers 1 for true, 0 for false), then do not do an explicit comparison against an integer:

     if (is_numerical_expr(expr)) {
             return evaluate(expr);
     }

Don't compare the result of is_numerical_expr(expr) to 1. (And if you want to know whether it's false, you should write !is_numerical_expr(expr).)

If an expression is really an integer expression, do make the comparison explicit. If n is an integer that counts down, write while (n > 0) ... rather than

while
 (n) ...

. Write if (strncmp(s, t, n) == 0) ... rather than if (!strncmp(s, t, n)) ....

Using pointer values as booleans is an even more obscure multi-layer pun! (The C standard does not require that the null pointer have the value 0: it just requires that when you write 0 in a pointer context, it's interpreted as the null pointer.)

Never write 0 for the null pointer, by the way. Write NULL.

Write assert(p != NULL); and if (p == NULL) ... rather than what you see in the book (assert(p) and if (!p) ... — yuk!).

Comments

Though gcc allows other forms, you must use only /* ... */ style comments. Do not use //. In your programs, most single-line comments should look like this:

   /* I am an acceptable single-line comment. */

Very important single-line comments, and most multiple-line comments, should look like this:

   /*
    * I am a multi-line comment.  My indentation should match the indentation
    * of the code that surrounds me.
    */

Sometimes it's useful to set off major sections of code like this:

   /*****************************************************************
    *                  Data formatting functions
    *****************************************************************/

or like this:

   /*----------------------------------------------------------------*
    |
    |                  Data formatting functions
    |
    *----------------------------------------------------------------*/

Note that the above are legal C comments starting with /* and ending with */.

Where a major portion of the code needs explanation or warnings, a box can be a good place to put it.


   /*****************************************************************
    *                  
    *                          login
    *                  
    *   Called when a user logs on.  This code is a little tricky, 
    *   because the new user structure we're creating may
    *   at the same time be accessed by other threads.
    *   Be sure to lock all accesses to structure members.
    *                  
    *****************************************************************/

Do not put big clunky boxes like this ahead of every function or all over your code. Use a hierarchy of commenting styles to make it easy for a reader to navigate the code. Boxes can be useful to set off major sections (though you should be suspicious: if code is long enough to require much of this, ask if it should be broken into multiple source files). Use the smaller style comments for individual functions, ahead of loops (if they require commenting), etc. Use "on the line" comments in cases where an individual line needs explanation. This can be particularly useful for variable declarations and initializations.

Sign your code

Begin every file you write with a comment block containing at least the following information:

/*
 *     filename
 *     by name(s), date
 *     assignment
 *
 *     summary
 */

...or if it's more appropriate to the style of the rest of your code, put this information in a box that begins the code, e.g.:

/**************************************************************
 *
 *                     filename
 *
 *     Assignment: assignment
 *     Authors:  name(s), 
 *     Date:     date
 *
 *     summary
 *
 *     ...you may provide more information here about 
 *        the program or file, it's interfaces, etc. here
 *
 **************************************************************/

In the above, filename is the name of the file, name(s) gives the name(s) of the file's author(s), date is the completion date, assignment tells which homework or project assignment this is for, and summary is a brief description of what the code in the file is for and hints about how it works (if that isn't obvious). This should include the relationship to other files where appropriate. For example, a C file that defines a resource (e.g., a hash table implementation) might give the name of the include file(s) that clients should use. The block may also include a change log for the benefit of the author(s).

The exact format of this header doesn't matter as long as it is clear and consistent with the formatting of the rest of the source file (e.g. some files may use heavy boxes in one style or another, in which case it's fine to use that format.

Function Contracts

Still a thing in 40! Function contracts should provide a reader with a high level overview of the function. We expect all your functions to have function contracts. A function contract should be written for a client. It should cover a high level overview of the functionality, what the parameters represent, what the return value is and represents, expectations, any changes to the program state, as well as checked and unchecked runtime errors.

Example of a file contract:

/********** allScoresUnderLimit ********
 *
 * Return true if all scores are under a given limit and return number of
 * scores under the limit via reference parameter
 *
 * Parameters:
 *      int *scoresUnderLimit: address of place to store # scores < limit
 *      int limit:             limit to compare to
 *      int scores[]:          array of scores
 *      int len:               length of scores array
 *
 * Return: true if all scores are under limit, false if not
 *
 * Expects
 *      scores and scoresUnderLimit must not be NULL
 * Notes:
 *      *scoresUnderLimit is set to the number of scores under limit
 *      Will CRE if scores or scoresUnderLimit is NULL
 *      May CRE if malloc fails.
 ************************/
bool allScoresUnderLimit(int *scoresUnderLimit, int limit,
                            int scores[], int len);

Students often ask whether function contracts should go in the .h or .c file, or both. This is a great question. Function contracts are for clients, and clients are not looking at the .c file, which would argue for putting them in the .h file. However, graders would like to refer to your contracts while reviewing your code, so put them in the .c file. There is no need to put them in the .h file: in real life, clients would look to external documentation, e. g., on a web page. In fact, there are modern tools that will take contracts and function headers from a .c file and make a web page for your interface! We will not use those tools in CS 40.

Simple code is better

In order to keep code comprehensible, it is very important to minimize the number of things a reader has to keep in their head at any one time. This leads directly to two important coding conventions:

Use more relatively short functions rather than fewer long functions — in other words, use procedural abstraction extensively.
Keep the complexity of functions down. Do this by structuring your code so that the maximum nesting depth is on the order of 3 (it's hard to keep track of 5 pending conditions or the state of 5 nested loops). Another way to keep code simple is to encode complexity in data structures rather than algorithms whenever possible. Extensive documentation of a static structure definition is easier to understand than code with tons of possible execution paths.

Leaving tracks when things aren't right

This is not a requirement for CS 40, but it's a terrific idea to mark any code that has shortcomings or that needs further attention with a uniform marker in comments. One example is: NEEDSWORK. so, a piece of code might have a comment like this:

   /*****************************************************************
    *                  
    *                     getUserName
    *                  
    *   NEEDWORK: doesn't correctly handle names from Asian
    *             countries in which family name comes ahead of
    *             given name.
    *                  
    *****************************************************************/

...or...

    fopen("myfile","rb"); /* NEEDSWORK: not sure if "rb" is right mode */

Marking problems in this way has a number of advantages:

It makes you think about the problem, and may encourage you to fix it
A week later, when your code is doing something mysterious, that NEEDSWORK may remind you of something you knew you didn't do right
Before final testing or submission, you can grep all of your source code including header files for that NEEDSWORK string. You'll get a list of everything in your program that you thought might not be right, and you can fix everything that needs fixing before release.

The worst thing you can do with a known problem is to lose track of it: almost surely at some later time, and possibly at a very inconvenient time, you will have to reconstruct your knowledge of the problem the hard way. NEEDSWORK markings can help.

You may be wondering: won't I lose credit in CS 40 if I point out bugs or misunderstandings? On the contrary, typically we will respect the fact that you are aware of the problem, and we may deduct less for a problem when you show us explicitly that you are aware of it.

Use Emacs — it makes it easy

Set Emacs's C Indentation Style to linux. For an individual editing session, you can type

C-c . linux

It's better to set up emacs to do this for all C source code, and you will want to tell Emacs to use spaces rather than tab characters. See using your text editor for fast compiles to find out how. When you are typing your code and you come to the end of a line, type C-j, rather than a return. If you do this, Emacs indents the next line automatically according to the indentation rules. Alternatively, hitting Tab will indent the current line according to the current rules. If you have edited a function and ruined the indentation (or you fear you may have brace or parenthesis problems), you can indent an entire block by placing the curser over the opening curly brace of the block and typing M-C-q.

Some more thoughts on good code and documentation

In your comments and other documentation, find a way to explain your work that is appropriate to the size of the problem.

For a trivial solution of a few lines of code, it may be best to have no explanation at all. (We're not entering the Obfuscated C Contest!)
For a solution of a dozen lines of code, a sentence or two is usually sufficient.
You should avoid writing tricky code whenever possible, but sometimes the best solution is subtle. In these rare cases, it may be appropriate to have an extensive comment even over a line or two of code, warning readers of whatever may not be obvious.

For these kinds of small problems, the best method of explanation is almost always comments in the source code.

For a solution of a hundred lines or more, I would expect several paragraphs of explanation. The explanation should cover not so much the code itself, but the organization of the code and the plan that produced it.
For larger programs, especially programs that are divided into multiple files, it is appropriate to go through a more thorough design process, some elements of which should be used to explain your work.

For these larger problems, you must describe your thinking at least as well as your code. For CS 40, these kinds of long explanation should typically be in the README file, not in the code itself. It is often appropriate to include brief comments about the overall architecture of a module in the source code, especially if that structure is not immediately obvious from the content and layout of the source itself.

Modularity and good documentation are complementary

Good programming style and documentation are essential ingredients in a clear and readable assignment. Good style includes the use of functions, modules, and abstract data types where appropriate.

Document every representation.
For all but the smallest problems, divide your own code into files, using the interfaces and implementations style. Use the slogan "every module hides a secret".
Choose appropriate names for functions and variables:
- Exported names should be long enough to be descriptive, like Seg_addhi or Segment_grow.
- Private names, including the names of parameters, can be shorter, but they should still be descriptive, like add.
- Often it's appropriate to use a "p" or "_p" suffix on pointer variables, e.g. student_p for a variable of type struct student *.
- For short-lived temporaries, use C's conventions for one-letter names: i for index, n for integer, s for string, p for pointer, and so on.

Use system library functions

Good style normally includes using library functions where possible, instead of duplicating effort by re-implementing something yourself. However, some libraries have such complex interfaces, or involve so much computational overhead, that you may be justified in avoiding them; when in doubt, ask the course staff.

Use good algorithms

Choose algorithms appropriate to the task at hand, and make your code structured and easy to read. Good use and placement of documentation is vital. Lots of comment text does not necessarily mean good documentation.

Appropriate technology choices

Good programmers understand that certain approaches to doing things involve way more computing time, memory space, or input/output to extrnal files than others approaches. Code that makes unreasonable choices when applying these technologies will typically receive litte or no credit for style and organization in CS 40. For example:

Floating point arithmetic typically involves much more compute overhead than integer arithmetic or logical operations, and ensuring that significant data is preserved can be tricky when using floating point. Accordingly, floating point formats and libraries should typcially not be used when equivalents are available using integers and/or logical operations. For example, when computing 2ⁿ for n < 64 it is typically unacceptable to use a function call like pow(2, n), which uses a lot of floating point arithmetic to implement generalized exponentiation. 2 << n is typically a far more efficient and more appropriate alternative.
Writing and reading intermedicate results to/from a temporary file is not allowed in CS 40 assignment solutions, unless specifically suggested or required in the assignment instructions. Reason: Writing data to a file or reading it back typically takes several milliseconds (thousandths of a second). Updating an entry in most reasonable data structures can be done in microseconds (millionths of a second), or sometimes even less. Remember, each CPU operation, such as adding two numbers, can be done in under a billionth of a second. So, your computer can do hundreds of thousands or perhaps millions of useful operations in the time it takes to write even a small amount of data into a file using fwrite or similar services. Most CS 40 assignments do not require that you discover the most optimized approach to solving the problem, but reasonable performance is expected. Unnecessarily writing and then reading back temporary files is thus typically unacceptable; solutions that use such approaches may incur a significant grading penalty, or may earn no credit at all, depending on the circumstance.

Last but not least...

If you're still reading this, you may be interested in checking out the Linux Kernel Coding Standards, which lay out the standards for modifications made to the open source operating system, Linux. (This is an older, more pithy version that the current one. If you want to see the latest version and learn more about the Linux kernel, search for “linux kernel coding style” and visit www.kernel.org.)