Grading standard for CS152 garbage collectors

This document explains how the garbage-collection homeworks were graded.

The grading was divided into two parts. The analysis was graded out of 50 points, broken down into 6 points for the proof, 12 points for the graphs, 12 points for the analysis and conclusions, and 20 points for the derivations. Each collector was worth 25 points.

Here is the grading standard for the proof:

     -1  Incomplete or incorrect proof that one of the functions which needs
         to maintain the invariant actually does.  For example, important
         condition like "growHeap doesn't move hp" omitted.
     -1  Doesn't make precise the notion of "before" or "after" hp.
     -1  Any other minor mis-statement indicating a possible lack of
         understanding but apparently not central to the proof.
     -2  Uses imprecise language e.g. heap is full, allocator goes to such a
         place.
     -2  Completely fails to show that one of the functions that needs to
         maintain the invariant(s) actually do(es).
     -3  Botches the idea of an invariant.
     -3  Botches the inductive nature of the argument.

Here is the grading standard for the graph:

     +2  Measurements with actual gamma rather than target gamma.
     +1  Several workloads that are different in significant ways.
     +1  Data points ranging upward from target-gamma approximating logarithmic
         spacing, enough so that you really see the shape of the graph.
     -1  Fewer than 6 data points.
     -1  Axes not properly labelled.
     -1  Workload not specified.
     -1  Bar graph where line graph more appropriate.
     -1  Excessively large scale crushes interesting data into tiny portion 
         of graph.
     -2  Fewer than 2 data points less than default gamma + 1.
     -3  Graphs total work against gamma rather than work per allocation.
     -6  Graphs anything other than total work or work per allocation against
         gamma.

Here is the somewhat approximate grading standard for the derivation:

     2   Realizes that GC work proportional to size of live data.
    2-5  Somewhat incorrect formula, but right idea.
    4-7  Essentially correct formula but not expressed in terms of gamma.
    7-10 Formula correct.  Exact grade based on quality/correctness of 
         explanation.

Formulae were classified as (1) right, (2) right but not in terms of gamma, (3) somewhat wrong, (4) completely wrong, and then points were adjusted depending on the explanation.

Here is the grading standard for the analysis and conclusions.

     +3  For any even minimally insightful commentary on your graph.
and  +3  For discussing how well your mark-and-sweep formula matched your
         experimental results.
and  +3  For discussing how well your stop-and-copy formula matched your
         experimental results.
and  +3  For comparing mark-and-sweep to stop-and-copy.
and  +2  For attempting to factor out the effect of heap growth.

but then

     -3  If your discussion of your graphs vs. your experimental results
         appeared to be incomplete, partially incorrect, or substantially
         uninsightful, or some combination of the three,
or   -6  If all of your work from this part of the assignment was
         extremely fragmentary, largely incorrect, or some combination 
         of the two.

In order to earn full credit for the analysis, students had to say something specifically about how well the measurements fit the data. If they derived formulas in the previous step that involved constants, they had to say something about having attempted to determine those constants experimentally and what they found out, or otherwise say something that made me believe that they'd given some serious thought to the correlation between the formulas and the numbers. For those that got the correct formula, noting the asymptote at gamma = 1 or gamma = 2 was generally sufficient. They also had to say something about the overall effectiveness of copying GC vs. mark-and-sweep GC, as mentioned in the assignment, or else they lost three points.

The grading of the collectors themselves was heavily based on style. There were generally no deductions for indentation, except in one egregious case where nearly everything was flush with the left margin, which cost two points. Any other stylistic error cost 1 point.

The biggest category of stylistic errors was encapsulation problems. You lost 1 point for each function which should have been static (local to heap module) and wasn't, with a maximum of 2 points per collector. You lost 1 point for exporting any global variables from the heap module, or 2 points if you either (1) including non-extern declarations for variables in a header file or (2) making things that should have been private to the heap module belong to some other module and sucking them into the heap module via extern declarations. In certain cases I also deducted 1 point for global variables in the heap module that should have been local or static in some function within the module.

You lost a point if you had any lines that wrapped around the 80 character margin. You lost a point if you had significant amounts of commented out code (#ifdef was OK, comments were not) embedded in functions. You lost a point if you left in debugging functions that you should have removed, unless you surrounded them by #ifdef DEBUG or something of the sort. You lost a point if you printed out an excessive amount of debugging output (one extra line at interpreter exit was forgivable; half a line every time you followed a forwarding pointer was not). You lost a point if you changed the value of ARENASIZE.

You lost a point if you used float arithmetic for control purposes. Double was OK, integer was even better, float was not OK. This agrees with everything I've ever read about use of floats in C, and for good reasons, too: round-off error is a killer. Not to mention the architectural reasons, like FPU context switching (shudder). Obviously you more or less needed to use float or double to print out the statistics, and that was fine, but you should use integer tests to determine how much to actually grow the heap.

If you asserted that target-gamma was bound to a NUMSX, you lost a point, but you shouldn't have. Apply for a reinstatement. You lost two points if you assumed that target-gamma was bound to a NUMSX without checking.

Many people lost points for problems with statistics. The maximum deduction for statistics-related problems was 4 points.

   -1 for not correctly printing GC stats every GC
   -1 for not correctly printing mem stats every 10th GC
   -1 for not correctly printing mem stats on exit
   -1 for not correctly printing overall stats on exit

The maximum deduction for not handling gamma correctly (independently of the above error-checking issue, which was considered separately) was 6 points. You lost 3 points if you only expanded the heap according to target-gamma when it was completely full. You lost all 6 points if you didn't implement target-gamma at all. You lost 2 points if you implemented it using some kind of wierd magic that didn't interact properly with local environments.

For any error that affected functionality in a significant way, you generally lost 3 points, assuming that your code still ran all right in spite of the error. In a small number of very minor cases this was reduced to 2 points.

If your code didn't run, you lost 3 points off the top if you failed an assertion and 4 points if it core dumped. (Luckily everything compiled.) In addition, you lost 4 points for each major omission or error in your code, with the following exceptions. You lost 2 points only for writing forwardSx(x) instead of x = forwardSx(x), but you lost 2 points for every time you made that error (no upper bound). You lost 10 points if you wrote a garbage collector that copied the entire heap (including the garbage) from from-space to to-space. Finally, if I couldn't readily identify and fix specific problems and thereby get your collector to work, you got 0 for that collector.

The test cases that were used to make the determination as to whether your code "ran" were a sample 1000-item merge sort and a sample 1000-item insertion sort, with target-gamma not set and with target-gamma set to 350. Everyone got the same sample input. Your code had to compile and run using cc on ice and it had to work on the sample input.

The above deductions were all cumulative. Most people whose code didn't run got almost no points for that collector because the deductions for things that don't work add up very quickly.

Finally, "gross inefficiency" got a 1-point deduction, never applied more than twice to the same assignment. This basically meant any algorithm which drastically increased time complexity. In particular, copying dead or irrelevant data got flagged. Repeated traversals of the arena list to calculate its size were permitted although frowned upon. As a special case, copying the heap once for every ARENASIZE cells added, even if the heap growth was a large multiple of ARENASIZE, cost -3 points.