CS 40 Coding Standards
Why coding standards are important
Computer programs are read by people as well as by machines. The best programmers take pride in producing code that is appealing to look at and easy to understand. There are many benefits to writing code clearly and consistently and ensuring the layout of your source code reflects its underlying logical structure. These benefits include:- Errors are easier for you to spot when you names things consistently, indent consistently, etc.
- Sooner or later your code will probably be used or maintained by others: they will be delighted if at first glance it's obvious how your code is arranged, and where to look for what.
- Writing good clear comments helps you think clearly about your code
- If your naming, punctuation, and indentation is consistent,
then automated tools can more easily search and manipulate your code.
For example, in the CS 40 coding standards, an open curly brace
{
alone on a line almost surely marks the start of a function definition. Editor macros and other tools can search for that. - In CS 40 in particular, the grader reading your code will need to rapidly understand what you are trying to do, and how you did it.
Your structure and organization grade will suffer substantially if your code doesn't observe the standards set out here. So, read all of this material carefully. If you have any doubts, ask. (I'm sorry to be so strict, but experience shows that most students simply disregard the requirement unless there is a serious grading implication.)
Requirements for CS 40 code
Language version and compiler switches
Code you submit must be accepted bygcc
without errors or warnings. CS 40 uses ANSI C-99, so you will use these
options:
gcc -std=c99 -pedantic -Wall -Wextra -WerrorYou must not use library routines that are non-standard extensions provided by GNU. Many but not all such extensions require you to
#define _GNU_SOURCE
, which you
must not do.
Also: for the intro homework assignment,
you MUST NOT use or consult the
man pages or documentation for the library routines
getline
or getdelim
.
No runtime faults
A program that dumps core, e.g., due to a "segmentation fault" earns No Credit for functional correctness, unless the assignment specifically allows for such failures.
Certain assignments allow for assertion failures or, to use Hanson's term, Checked Runtime Errors (CREs); the default handler for such exceptions does dump core. There is no deduction in grade for such core dumps if the assignment allows for such assertions or exceptions.
This implies that you must do thorough error checking! In all systems and library code, there should be no undocumented unchecked runtime errors (that is, the code should never crash exept in ways that are documented). Assignments will say explicitly whether unchecked runtime errors are permitted, and this is rare. Therefore:
- All library calls that can produce a failure must be
followed by tests for failure. This prominently
includes
malloc
andcalloc
, which can fail if sufficient memory is unavailable. Similarly,fopen
,strtol
, and any other built-in function that can fail should have the call followed by checks for success and appropriate error handling. - Thus, it can often be good practice to wrap such calls in your
own (appropriately named) functions that validate the results.
This is especially useful when failure results in a checked
runtime error. Hanson has done this for memory allocation, and
you are encouraged to use his
mem
interface. (See the C idioms page for an example.) - Do not use library functions that can fail with no indication
of failure such as
atoi
(usesscanf
,strtol
, or similar instead and check for errors).
Valgrind
To earn a grade of Very Good or better, code must run undervalgrind
without leaks or
errors.
There are two exceptions to this policy:
-
Memory leaks attributable to use of Hanson's
Atom
s will not count against you, because there is no mechanism for freeingAtom
memory, and thus freeing it is impossible. (Note: This is not an oversight or a bug in Hanson'sAtom
facility. All languages with an equivalent data type have this property, because it is essential for the semantics of anAtom
, or whatever the relevant language calls it.) - In situations where the assignment allows termination with a checked runtime exception and where your program actually raises such an exception, memory leaks will not reduce your grade, because avoiding these leaks is impossible.
valgrind --leak-check=full --show-reachable=yes <your_program>To earn a grade of Good or better, code must run under
valgrind
without
errors.
Source code formatting
For this course, we will adopt a variant of the Linux kernel coding style. These are the rules that govern code that is written for the Linux kernel. Many of you will find that the brace placement and/or the indentation will take time to get used to. That's ok (I had to switch, too). Here are the formatting rules:- Your code must not wrap when displayed in 80 columns.
- Each level of indentation must be eight (8) characters. Not 4 or 5 or (shudder) 2 spaces.
- Your code must not contain tab characters. This is different from Linus's rules, but relieves us of the difficulty of having everyone set up their tabs the same. To see how to set up emacs or vim to use spaces rather than tab, see the notes on using your text editor for fast compiles.
- Open curly braces at the start of a function go on a line by themselves, but other open curly braces go at the end of the line that opens the block (with a space before the brace). Yes, it's inconsistent. Follow Linus's conventions!
- Curly braces should be used for all conditional statements and loops, even if their body is a single line. This is different from Linus's rules, but protects you from errors that can occur when you change your code later.
- Put spaces around infix operators, such as
=
,==
,+
,*
, etc. For examplex = NULL;
is right, andx=NULL;
is wrong. - Put a space after every comma in an argument list and after
every semi-colon in a
for
statement. - Put a space between the
keywords
if
,for
,while
and the following parenthesis. They aren't functions, so don't write them like function calls. - Separate curly braces from adjacent, non-blank
characters. That is, don't write
if (test){
. Writeif (test) {
.
More on indentation: do not yield to temptation
Spreading things out and using large indentations make it difficult to keep to the 80 character line width, and you will be tempted to do bad things.Do not violate the indentation rules in order to keep a bunch of code on one line. Breaking lines at appropriate places is good (as long as proper indentation is used). Do not squeeze space out of expressions (e.g., do not remove the spaces around operators, after commas, etc.). The following is bad:
int x=2; int y=x*a_long_function_name(x*x,atoi(argv[1]),g(x+1,NULL));This is much better:
int x = 2; int y = x * a_long_function_name(x * x, int_from_string(argv[1]), g(x + 1, NULL));You can tell how many arguments this function has even 6 feet from the terminal! It also has the benefit of using a function the auther evidently wrote to validate that
argv[1]
actually
contains an integer.On the positive side, large indentations, spreading things out, and the 80-column rule have a way of conspiring to keep your functions simpler. Every level of indentation represents something the reader of your code has to keep in their heads, and these rules mean you can't have too many levels of indentation!
Use stdbool.h
Use stdbool.h
so you can use the type bool
and the constants true
and false
when you
are dealing with boolean data.
Don't rely on puns
Alas, the Hanson book violates this constantly. But you won't.The fact that C doesn't have a native boolean type and uses integers for tests leads to lots of puns that make code less clear.
If you call a function that returns a boolean value (whether it uses the definitions from
stdbool.h
internally or uses
integers 1 for true, 0 for false), then do not do an explicit
comparison against an integer:
if (is_numerical_expr(expr)) { return evaluate(expr); }Don't compare the result of
is_numerical_expr(expr)
to
1. (And if you want to know whether it's false, you should
write !is_numerical_expr(expr)
.)If an expression is really an integer expression, do make the comparison explicit. If
n
is an integer that counts
down, write while (n > 0) ...
rather than while
(n) ...
. Write if (strncmp(s, t, n) == 0) ...
rather than if (!strncmp(s, t, n)) ...
.Using pointer values as booleans is an even more obscure multi-layer pun! (The C standard does not require that the null pointer have the value 0: it just requires that when you write 0 in a pointer context, it's interpreted as the null pointer.)
Never write 0 for the null pointer, by the way. Write
NULL
.
Write
assert(p != NULL);
and if (p == NULL) ...
rather than what you see in the
book (assert(p)
and if (!p) ...
— yuk!).
Comments
Thoughgcc
allows other forms, you must use only
/* ... */
style comments. Do not use
//
.
In your programs, most single-line comments should look like this:
/* I am an acceptable single-line comment. */Very important single-line comments, and most multiple-line comments, should look like this:
/* * I am a multi-line comment. My indentation should match the indentation * of the code that surrounds me. */Sometimes it's useful to set off major sections of code like this:
/***************************************************************** * Data formatting functions *****************************************************************/or like this:
/*----------------------------------------------------------------* | | Data formatting functions | *----------------------------------------------------------------*/Note that the above are legal C comments starting with
/*
and ending with */
. Where a major portion of the code needs explanation or warnings, a box can be a good place to put it.
/***************************************************************** * * login * * Called when a user logs on. This code is a little tricky, * because the new user structure we're creating may * at the same time be accessed by other threads. * Be sure to lock all accesses to structure members. * *****************************************************************/Do not put big clunky boxes like this ahead of every function or all over your code. Use a hierarchy of commenting styles to make it easy for a reader to navigate the code. Boxes can be useful to set off major sections (though you should be suspicious: if code is long enough to require much of this, ask if it should be broken into multiple source files). Use the smaller style comments for individual functions, ahead of loops (if they require commenting), etc. Use "on the line" comments in cases where an individual line needs explanation. This can be particularly useful for variable declarations and initializations.
Sign your code
Begin every file you write with a comment block containing at least the following information:/* * filename * by name(s), date * assignment * * summary */...or if it's more appropriate to the style of the rest of your code, put this information in a box that begins the code, e.g.:
/************************************************************** * * filename * * Assignment: assignment * Authors: name(s), * Date: date * * summary * * ...you may provide more information here about * the program or file, it's interfaces, etc. here * **************************************************************/In the above, filename is the name of the file, name(s) gives the name(s) of the file's author(s), date is the completion date, assignment tells which homework or project assignment this is for, and summary is a brief description of what the code in the file is for and hints about how it works (if that isn't obvious). This should include the relationship to other files where appropriate. For example, a C file that defines a resource (e.g., a hash table implementation) might give the name of the include file(s) that clients should use. The block may also include a change log for the benefit of the author(s).
The exact format of this header doesn't matter as long as it is clear and consistent with the formatting of the rest of the source file (e.g. some files may use heavy boxes in one style or another, in which case it's fine to use that format.
Function Contracts
Still a thing in 40! Function contracts should provide a reader
with a high level overview of the function. We expect all your
functions to have function contracts.
A function contract should be written for a client. It
should cover a high level overview of the functionality, what the
parameters represent, what the return value is and represents,
expectations, any changes to the program state, as well as checked
and unchecked runtime errors.
Example of a file contract:
/********** allScoresUnderLimit ******** * * Return true if all scores are under a given limit and return number of * scores under the limit via reference parameter * * Parameters: * int *scoresUnderLimit: address of place to store # scores < limit * int limit: limit to compare to * int scores[]: array of scores * int len: length of scores array * * Return: true if all scores are under limit, false if not * * Expects * scores and scoresUnderLimit must not be NULL * Notes: * *scoresUnderLimit is set to the number of scores under limit * Will CRE if scores or scoresUnderLimit is NULL * May CRE if malloc fails. ************************/ bool allScoresUnderLimit(int *scoresUnderLimit, int limit, int scores[], int len);
Students often ask whether function contracts should go in the
.h
or .c
file, or both. This is a
great question. Function contracts are for clients, and clients
are not looking at the .c
file, which would argue for
putting them in the .h
file. However, graders would like to refer to
your contracts while reviewing your code, so put them in
the .c
file. There is no need to put them in
the .h
file: in real life, clients would look to
external documentation, e. g., on a web page. In fact, there
are modern tools that will take contracts and function headers
from a .c
file and make a web page for your interface!
We will not use those tools in CS 40.
Simple code is better
- Use more relatively short functions rather than fewer long functions — in other words, use procedural abstraction extensively.
- Keep the complexity of functions down. Do this by structuring your code so that the maximum nesting depth is on the order of 3 (it's hard to keep track of 5 pending conditions or the state of 5 nested loops). Another way to keep code simple is to encode complexity in data structures rather than algorithms whenever possible. Extensive documentation of a static structure definition is easier to understand than code with tons of possible execution paths.
Leaving tracks when things aren't right
/***************************************************************** * * getUserName * * NEEDWORK: doesn't correctly handle names from Asian * countries in which family name comes ahead of * given name. * *****************************************************************/...or...
fopen("myfile","rb"); /* NEEDSWORK: not sure if "rb" is right mode */Marking problems in this way has a number of advantages:
- It makes you think about the problem, and may encourage you to fix it
- A week later, when your code is doing something mysterious, that NEEDSWORK may remind you of something you knew you didn't do right
- Before final testing or submission, you can
grep
all of your source code including header files for that NEEDSWORK string. You'll get a list of everything in your program that you thought might not be right, and you can fix everything that needs fixing before release.
You may be wondering: won't I lose credit in CS 40 if I point out bugs or misunderstandings? On the contrary, typically we will respect the fact that you are aware of the problem, and we may deduct less for a problem when you show us explicitly that you are aware of it.
Use Emacs — it makes it easy
linux
. For an
individual editing session, you can type
C-c . linux
It's better to set up emacs to do this for all C source code, and you
will want to tell Emacs to use spaces rather than tab characters.
See using your text
editor for fast compiles to find out how.
When you are typing your code and you come to the end of a line,
type C-j
, rather than a return. If you do this, Emacs
indents the next line automatically according to the indentation
rules. Alternatively, hitting Tab
will indent the
current line according to the current rules. If you have edited a
function and ruined the indentation (or you fear you may have brace
or parenthesis problems), you can indent an entire block by placing
the curser over the opening curly brace of the block and typing
M-C-q
.
Some more thoughts on good code and documentation
- For a trivial solution of a few lines of code, it may be best to have no explanation at all. (We're not entering the Obfuscated C Contest!)
- For a solution of a dozen lines of code, a sentence or two is usually sufficient.
- You should avoid writing tricky code whenever possible, but sometimes the best solution is subtle. In these rare cases, it may be appropriate to have an extensive comment even over a line or two of code, warning readers of whatever may not be obvious.
- For a solution of a hundred lines or more, I would expect several paragraphs of explanation. The explanation should cover not so much the code itself, but the organization of the code and the plan that produced it.
- For larger programs, especially programs that are divided into multiple files, it is appropriate to go through a more thorough design process, some elements of which should be used to explain your work.
README
file, not
in the code itself. It is often appropriate to include brief
comments about the overall architecture of a module in the source
code, especially if that structure is not immediately obvious from
the content and layout of the source itself.
Modularity and good documentation are complementary
Good programming style and documentation are essential ingredients in a clear and readable assignment. Good style includes the use of functions, modules, and abstract data types where appropriate.- Document every representation.
- For all but the smallest problems, divide your own code into files, using the interfaces and implementations style. Use the slogan "every module hides a secret".
- Choose appropriate names for functions and variables:
- Exported names should be long enough to be descriptive, like
Seg_addhi
orSegment_grow
. - Private names, including the names of parameters, can be
shorter, but they should still be descriptive, like
add
. - Often it's appropriate to use a "p" or "_p" suffix
on pointer variables, e.g.
student_p
for a variable of typestruct student *
. - For short-lived temporaries, use C's conventions for
one-letter names:
i
for index,n
for integer,s
for string,p
for pointer, and so on.
- Exported names should be long enough to be descriptive, like
Use system library functions
Good style normally includes using library functions where possible, instead of duplicating effort by re-implementing something yourself. However, some libraries have such complex interfaces, or involve so much computational overhead, that you may be justified in avoiding them; when in doubt, ask the course staff.Use good algorithms
Choose algorithms appropriate to the task at hand, and make your code structured and easy to read. Good use and placement of documentation is vital. Lots of comment text does not necessarily mean good documentation.Appropriate technology choices
Good programmers understand that certain approaches to doing things involve way more computing time, memory space, or input/output to extrnal files than others approaches. Code that makes unreasonable choices when applying these technologies will typically receive litte or no credit for style and organization in CS 40. For example:
- Floating point arithmetic typically
involves much more compute overhead than integer arithmetic or
logical operations, and ensuring that significant data is
preserved can be tricky when using floating
point. Accordingly, floating point formats and libraries
should typcially not be used when equivalents are available using
integers and/or logical operations. For example, when
computing
2n
forn < 64
it is typically unacceptable to use a function call likepow(2, n)
, which uses a lot of floating point arithmetic to implement generalized exponentiation.2 << n
is typically a far more efficient and more appropriate alternative. - Writing and reading
intermedicate results to/from a temporary file is not allowed in
CS 40 assignment solutions, unless specifically suggested
or required in the assignment instructions.
Reason: Writing data to a file or reading it back typically takes
several milliseconds (thousandths of a second). Updating an entry
in most reasonable data structures can be done in microseconds
(millionths of a second), or sometimes even less. Remember, each
CPU operation, such as adding two numbers, can be done in under a
billionth of a second. So, your computer can do hundreds of
thousands or perhaps millions of useful operations in the time it
takes to write even a small amount of data into a file
using
fwrite
or similar services. Most CS 40 assignments do not require that you discover the most optimized approach to solving the problem, but reasonable performance is expected. Unnecessarily writing and then reading back temporary files is thus typically unacceptable; solutions that use such approaches may incur a significant grading penalty, or may earn no credit at all, depending on the circumstance.
Last but not least...
www.kernel.org
.)