Readability of Programming Assignments

In this course, you will solve a diverse collection of individual problems A solution to a problem is of little value if that solution cannot be understood and modified by others. We rely on two major components to ensure our solutions are readable: good programming style and documentation. Good style includes the use of functions to abstract common patterns found in your code and taking care to make your code structured and easy to read. Documentation should focus on high-level issues such as design and organization, data representation, and high-level descriptions of functions. Lots of comment text is usually a sign of poor documentation.

The rest of this page provides details on the style and documentation practices valued in CS 105, followed by a full rubric for the code you will write for CS 105 assignments.

Requirements for Coding Style

We emphasize readability and clarity. Work that we find obscure or difficult to read will earn lower grades. Here are two specific style constraints that your code must follow:
  • Your code should not wrap when displayed in 80 columns, and it must not contain TAB characters. (Why this stuff matters and how to deal with it)
  • All submitted code must obey the offside rule, as explained below.

The offside rule

The offside rule comes into play when a definition, declaration, expression, or statement (which we'll call a construct) has to be split across multiple lines of code. Roughly speaking, it says that when you split a construct, later lines have to be indented with respect to the beginning of the construct. They may never be "outdented" to the left. The rule is based on the start of the construct, not the start of the line.

Here's the rule given more precisely:
Code respects the offside rule if and only if whenever the first token of a construct appears in column i, every other token appears in a column greater than i.
Using Lisp syntax, the rule is very easy to interpret: any token enclosed in parentheses must appear to the right of the opening parenthesis. Here's an example of code that violates the offside rule:
(define gcd (m n)
    (begin (while (!= (set r (mod m n)) 0)
        (begin
          (set m n)
          (set n r)))
      n))
The problem is with the second begin: it is part of the body of the while loop, so it should appear to the right of the while.

Here is the same function formatted in a way that respects the offside rule:
(define gcd (m n)
    (begin
      (while (!= (set r (mod m n)) 0)
        (begin
         (set m n)
         (set n r)))
      n))
And here is the same function formatted more compactly. The layout isn't aesthetically optimal here, but it does respect the offside rule:
(define gcd (m n) (begin
                    (while (!= (set r (mod m n)) 0)
                       (begin (set m n) (set n r)))
                    n))
Here is a C declaration that violates the offside rule:
void *big_function_name(const void *groupings, const void *key, void *value, int
                                                                     total);
The problem is in the declaration of the last parameter: total appears to the left of the first token, int. This code was created by a computer program that squeezes wide code into 80 columns but may violate the offside rule. A more clever program, or a human being, might format the code like this:
void *big_function_name(const void *groupings, const void *key, void *value, 
                        int total);
which respects the offside rule.

Requirements for Documentation

Documentation can take two forms: large-scale or small-scale. Large-scale documentation takes the form of paragraphs, often in a README file, explaining major design decisions for multi-file projects that together solve a problem. In CS 105, most of your problem-solving will be done with small, individual functions; such small solutions require small-scale documentation, which takes the form of comments written directly in the code. This small-scale documentation should answer these questions:
  • What is the contract of each function? In the "real world", the contract of a multi-line function should be captured with a comment, while the contract of a one-line function should be evident merely from the name of the function and from the names and types of its arguments and result. However, in CS 105 we are developing your documentation skills, so you will often be asked to write contracts for every function you write.
    There's more about contracts below.
  • What are the arguments of your functions? For example, is a "list of edges" the list of all edges in the graph or the list of edges not yet visited?
  • Are there any error or edge cases?
Please don't use comments to repeat information that is readily available in the code. Here is an example of poor documentation:
; (example of poor commenting style)
;
;  Function: caar
;  Description: Returns the car of the car of the given element.
;
(define caar (x)
  (car (car x)))
Everything in the comment is more easily learned just by looking at the code. Here is an example of better documentation:
; (example of better commenting style)
;
; Visit vertex and every successor of vertex that appears in edges,
; in depth-first order.  Executed for side effects on the global 
; variables output and visited.
; Graph is the complete list of edges in the graph; 
; we use structural recursion on edges.  

(define dfsvisit (vertex edges graph) 
  (if (null? edges) '()
      ( ...  (dfsvisit mumble (cdr edges) graph) ...)))
The comment explains what the function is doing, notes that there is no return value but there are side effects, and explains what the parameters mean. This comment should be supported by comments on the global variables output and visited, saying what they represent and any requirements imposed on those representations.

Here are two examples of very poor commenting style:
++i;   /* Use the prefix autoincrement operator to increase i by one. */

for (;;) {   /* Infinite loop. */

What is a contract?

A contract is associated with a function and relates the values of variables and other machine state at exactly two points in time:
  • The point at which the function is called
  • The point at which the function returns
Functions that do not refer to global variables or to mutable state on the heap have exceptionally simple contracts. (This simplicity is one reason people like them.)

As an example, a function to compute the greatest common denominator of two natural-number arguments m and n might have this contract:
Returns the largest integer that evenly divides both m and n.
Another example of a contract would be for a sort function:
Takes a list of integers ns and returns a new list containing the same elements as the argument list, but in nondecreasing order.
Contracts should be included as a comment above the function.

Here is an example of documentation that does not express a contract:
The function walks through the tree and at each step checks to see if the node contains a prime number. If not, it checks the left subtree and the right subtree for prime numbers.
There are some signs that something is very wrong:
  • The documentation says nothing about what the function returns.
  • The documentation contains a narrative description of an algorithm, i.e., a computation that evolves over multiple points in time.
  • The algorithm being described is almost impossible to write a contract for. Probably the author intended to implement a different algorithm.
Here is a good contract for a related algorithm:
Takes as argument a binary-search tree full of integers, and returns the smallest prime number in that tree, or if the tree contains no prime number, returns -1.

Coding Rubric

Exemplary Satisfactory Must improve
Documentation
  • The contract of each function is clear from the function's name, the names of its parameters, and the documentation.
  • Each function complex enough to require an explicit contract has a comment explaining what the function returns in terms of its parameters, which are mentioned by name.
  • When not clear from just parameter names, the contract includes constraints on what inputs are permissible, e.g. that n must be nonnegative.
  • The contract of each function is written without case analysis, or case analysis was unavoidable.
  • Documentation appears consistent with the code being described.
  • A function’s contract omits some parameters.
  • A function’s documentation includes information that is redundant with the code, e.g., "this function has two parameters."
  • A function's contract omits some constraints on parameters, e.g., forgetting to say that the contract requires nonnegative parameters.
  • A function's contract includes a case analysis that could have been avoided, perhaps by letting some behavior go unspecified.
  • A function's documentation includes a narrative description of what happens in the body of the function, instead of a contract that mentions only the parameters and result.
  • A function’s documentation neither specifies a contract nor mentions every parameter.
  • Documentation appears inconsistent with the code being described.
Form
  • All code fits in 80 columns.
  • The submitted code contains no tab characters.
  • All code respects the offside rule.
  • Indentation is consistent everywhere.
  • Indentation leaves most code in the left half or middle part of the line.
  • In bridge languages, if a construct spans multiple lines, its closing parenthesis appears at the end of a line, possibly grouped with one or more other closing parentheses.
  • No code is commented out.
  • All unit tests (check-expect, check-assert, and check-error) are indented by 8 spaces.
  • Code is tested using unit-test forms, not print calls. Submitted solutions do not contain any code that prints a value.
  • One or two lines are substantially wider than 80 columns.
  • The code contains one or two violations of the offside rule.
  • In one or two places, code is not indented in the same way as structurally similar code elsewhere.
  • Indentation pushes significant amounts of code to the right margin.
  • Solution files may contain clearly marked test functions, but they are never executed. It’s easy to read the code without having to look at the test functions.
  • Three or more lines are substantially wider than 80 columns.
  • An ASCII tab character lurks somewhere in the submission.
  • The code contains three or more violations of the offside rule.
  • The code is not indented consistently.
  • Indentation pushes significant amounts of code so far to the right margin that lots of extra line breaks are needed to stick within the 80-column limit.
  • The closing parenthesis of a multi-line construct is followed by more code (or by an open parenthesis) on the same line.
  • A closing parenthesis appears on a line by itself.
  • Solution file contains commented-out code.
Naming
  • Each function is named either with a noun describing the result it returns, or with a verb describing the action it does to its argument. (Or the function is a predicate and is named as suggested below.)
  • A function that is used as a predicate has a name that is formed by writing a property followed by a question mark. Examples might include even? or prime?. (Applies only if the language permits question marks in names.)
  • In a function definition, the name of each parameter is either (a) a noun saying what, in the world of ideas, the parameter represents, (b) a name given in the problem statement, or (c) conventional, e.g., an index might be i, j, or k, a string might be s, a variable might be x, an expression might be e.
  • Functions' names are overly complex or don't convey enough specific information about what the function does.
  • A function that is used as a predicate (for if or while) does not have a name that ends in a question mark. (Applies only if the language permits question marks in names.)
  • Although the name of a parameter is not short and conventional, not an English noun, and not a name from the math or the problem, it is still recognizable; perhaps as an abbreviation or a compound of abbreviations.
  • Auxiliary functions are given names that simply indicate a vague relationship with another function. Often such names are formed by combining the name of the other function with a suffix such as aux, helper, 1, or even _.
  • A function's name is not connected to what it does or returns.
  • The name of some parameter is not recognizable.
Laws
  • When defining function f, each left-hand side applies f to one or more patterns, where a pattern is a form of input (examples: (+ m 1), (cons x xs)).
  • When a law applies only to equal inputs, those inputs are notated with the same letter.
  • For every permissible form of the function’s input or inputs, there is an algebraic law with a matching left-hand side (and a matching side condition, if any).
  • The patterns of the left-hand sides of laws defining function f are either mutually exclusive or are distinguished with side conditions written on the right-hand side.
  • Every variable on the right-hand side of every law appears on that law’s left-hand side.
  • On a left-hand side, f is applied to a form of input, but the form of input is written in a way that is not consistent with code.
  • When a law applies only to equal inputs, the equality is written as a side condition.
  • For every permissible form of the function’s input or inputs, there is an algebraic law with a matching left-hand side, but some inputs might inadvertently be excluded by side conditions that are too restrictive.
  • Side conditions appear on the left-hand side of a law.
  • One or more left-hand sides contain laws that are not applications of f.
  • On a left-hand side, f is applied to something that is not a form of input.
  • There is permissible input whose form is not matched by the left-hand side of any algebraic law.
  • There is at least one input to which it is ambiguous which law should apply: the input matches more than one left-hand side, and either there are no side conditions, or the side conditions are insufficient to distinguish the ambiguous laws.
  • The right-hand side of a law mentions a variable that does not appear on that law’s left-hand side.
Structure
  • The function begins with a case analysis that finds which algebraic law applies.
  • The function contains one case for each algebraic law.
  • In every case analysis, all cases are necessary.
  • The code of each function is so clear that, with the help of the function's contract, course staff can easily tell whether the code is correct or incorrect.
  • There's only as much code as is needed to do the job.
  • In the body of a recursive function, the code that handles the base case(s) appears before any recursive calls.
  • The function contains a case for each algebraic law, but it also contains an additional, redundant case, for a situation that is covered by other cases in the code.
  • Course staff have to work to tell whether the code is correct or incorrect.
  • There's somewhat more code than is needed to do the job.
  • Code for one or more base cases appears after a recursive call.
  • The case analysis in the code is not obviously connected to the algebraic laws (serious fault).
  • A significant fraction of the case analyses in the code are redundant.
  • From reading the code, course staff cannot tell whether it is correct or incorrect or what it's doing.
  • There's about twice as much code as is needed to do the job.
  • Code can be simplified by applying algebraic laws. For example, the code says (+ x 0), but it could say just x.
Testing
  • Tests are written using the provided unit-testing forms.
  • Expected outputs are tested using check-assert for Boolean functions and check-expect for other functions.
  • Every test checks behavior that is required by the function's contract.
  • There is one unit test for each permissible form of input.
  • To the degree possible, every Boolean function is tested with every form of input for both "true" and "false" results. (Not every result is always possible with every form of input.)
  • An expected Boolean output is tested by using check-expect.
  • There is a test for each permissible form of input.
  • Boolean functions are tested with only one result per form of input.
  • Tests are written as top-level expressions.
  • An expected output is tested by using check-assert with = or other equality test.
  • Some test uses inputs that violate the function's contract or provoke other undefined behavior.
  • There are so many tests that course staff can't easily identify the basic tests for each form of input
Correctness
  • Functional correctness tests pass with no faults.
  • Or, under test, staff functional correctness tests identify tiny faults arising from problems with arithmetic overflow or from some confusion about exactly what numbers are prime.
  • Staff functional correctness tests identify a few faults.
  • Or, staff functional correctness tests identify a single fault that shows a lack of understanding.
  • Staff functional correctness tests identify a preponderance of faults.
  • Staff functional correctness tests fail because the names of helper functions are spelled differently in different places (serious fault).
  • When we attempt to load solution code, there are errors (No Credit).