\subsection*{Garbage collection}

Software due Friday, October 22, at 12:10AM (a.k.a. Thursday night at
midnight).\htmlBR
Write-ups due \textit{on paper} Friday, October 22, at 5:10PM in Maxwell
Dworkin 231. 

The purposes of this exercise are 

\begin{itemize}
\item To review the implementation of mark-and-sweep garbage collection, which
you have seen in CS~51. 
\item To understand how to bound mark-and-sweep collection times by
interleaving the sweep phase with the allocator. 
\item To understand how to implement a simple copying collector. 
\item To study and understand the effects of heap size on performance in both
mark-and-sweep and copying collectors. 
\end{itemize}

This is a long assignment and will count 100~points: double the value of a
typical assignment. Before you begin work, 

\begin{itemize}
\item Read and understand the heap interface from the Stage~1 collector. 
\item Read through the entire assignment. 
\end{itemize}

Your GC study is divided into five stages. We have implemented the first two
stages. You should be able to implement each of the final three stages
separately. Make sure you understand the entire stage before beginning work on
it. The code you need to get started is in \texttt{{\htmlTilde}cs152/asst/gc}.


If you have trouble with one of the stages, \textbf{seek help}. Don't just go
on to the next stage and assume things will work out. 

\begin{enumerate}
\item \textbf{Stage 1: Arenas}. In stage 1, we observe that [[mkSx]] is the
only function that allocates S-expressions with [[malloc]]. We have changed
that [[malloc]] to [[allocSx]]. [[allocSx]] uses a linked list of ``arenas.''
For our mark-and-sweep collector, the arena is the unit in which the
memory-management code gets memory from the operating system. The collection
of all arenas will constitute the ``managed heap.'' 

Each arena is an array holding some constant number of S-expressions, with
mark bits, e.g.: 
<<example>>=
#define GROWTH_UNIT 24
struct markedsx { /* S-expression with mark bit */
  struct sx sx;
  unsigned live:1;
};
typedef struct arena {
  struct markedsx pool[GROWTH_UNIT];
  struct arena *tail;
}
@ Notice that arenas are chained on a linked list; the head of that list is
called [[heap]]. One arena is always the current free arena, and the variable
[[freeArena]] points to it. The heap pointer [[hp]] points to next available
S-expression in the current free arena. The limit pointer [[heap_limit]]
points just beyond the last S-expression in the current free arena, so at any
moment the number of free S-expressions in the current free arena is
[[heap_limit - hp]]. 
<<example>>=
struct markedsx *hp, *heap_limit;
struct arena *heap, *freeArena;
@ Here's a picture of what the arenas look like in the middle of an allocation
cycle: \htmlBR
[IMAGE]\htmlBR
Dark shaded areas are used, while light areas are available for allocation.
Except for the current free arena, arenas are either entirely used or entirely
available. [[heap_limit - hp]] gives the size of the light area in the
current arena. Of course, this picture doesn't make sense for the stage 1 we
have implemented---in the implementation we give you, the current arena is
also always the last one. Also, once we have a garbage collector, ``used'' and
``available'' will be only approximations to what the dark and light areas
represent. 

We have implemented a function [[allocSx]], which takes an S-expression (that
is, a free cell) from the current free arena. If there are no S-expressions
remaining in the current arena, it calls [[growHeap]] to add a new arena to
the tail of the list of all arenas. 

\item \textbf{Stage 2: Roots}. The next step in preparing a collector is to
identify the roots: all the places where pointers might lead to live
S-expressions. Another way to say it is we have to know what [[Sx]]s might
possibly be used after a call to [[allocSx]]. These certainly include every
[[Sx]] reachable from every [[rho]] of every invocation of [[eval]], as well
as various temporary values and environments. Some S-expressions are also live
during parsing. 

To solve this stage, we have broken the problem down by type. 

\begin{itemize}
\item First, we have identified all the types that might contain a pointer
leading to a value of type [[SEXP]]. Any of the following types can contain
pointers to an  [[SEXP]]: 

\begin{itemize}
\item [[Sx]], through the [[car]] and [[cdr]] fields of [[u.list]]. 
\item [[Valuelist]], because it is a list of [[Sx]]. 
\end{itemize}

And then we can get pointers indirectly through values of the following types:


\begin{itemize}
\item [[Env]], because it contains a [[Valuelist]]. 
\item [[Ast]], through the [[VAL]] field. 
\item [[Astlist]], as a list of [[Ast]]s. 
\item [[Sx]] again, when it is a closure, which holds both [[Ast]] and
[[Env]]. 
\item [[Exp]], through the [[sx]] field. 
\item [[Explist]], as a list of [[Exp]]s. 
\item [[Input]], as it may contain [[Ast]]. 
\end{itemize}

For each of these types, we have written a tracing function that follows
pointers and marks S-exprssions. 

\item The next step is identifying the root set.  Any value of any of the
types named above is potentially a root if it could be live during a
traced-state allocation. The set of roots therefore includes: 

\begin{enumerate}
\item [[globalEnv]] 
\item [[ipt]] in [[main]] 
\item potentially, local variables and actual parameters of any function that
calls a function that could allocate. (A function could allocate either if it
calls [[allocSx]], or if it calls a function that could allocate.) These
include: 
\item every [[rho]] of every activitation of [[eval]] 
\item every [[s]] in every activation of [[evalList]]  ([[s]] is the
\textit{result} of a previous [[eval]] that's being saved for later use.  To
understand the details, look at the source  for [[evalList]].) 
\item a great many roots in parsing and converting [[Exp]] to [[Ast]],
culminating in  the value returned by [[readInput]] 
\end{enumerate}

The functions that call functions that might allocate are 

\begin{quote}
\texttt{readInput}, \texttt{parseExp}, \texttt{parseEL}, \texttt{expToInput},
\texttt{expToAst}, \texttt{explistToAstlist}, \texttt{getBindings},
\texttt{expToSx}, \texttt{explistToSx}, \texttt{mkSYM}, \texttt{mkPRIM},
\texttt{main}, \texttt{eval}, \texttt{evalList}, \texttt{applyClosure},
\texttt{applyValueOp}, \texttt{applyArithOp}, \texttt{mkCLO}, \texttt{mkNUM},
\texttt{mkLIST}, and \texttt{mkSx}. 
\end{quote}

In each of these functions, we must do work to ensure that every potential
root is visible to the garbage collector. To make the job easier, we have used
the following precondition: 

\begin{quote}
\textit{When any of these functions is called, it is guaranteed that its
arguments are reachable through some root already on the root stack. 

} We call a value \textit{traced} if it is reachable through some root on the
stack. We can therefore state the requirements as follows: \textit{any of
these functions may assume that its arguments are traced}, and \textit{the
caller of any of these functions must guaranteed that its arguments are
traced}. 
\end{quote}

Using this precondition simplifies things, and without it, we would find
ourselves putting multiple copies of the same [[rho]] on the root stack over
and over again. 

The code we supply you already includes the proper calls to [[pushRoot]] and
[[popRoots]]. 
\end{itemize}

In stage 3, you'll have to trace pointers starting at all of the roots listed
above. To help ensure that all roots are visible, we have implemented a new
primitive, [[show-roots]], whose only purpose is to help debug the garbage
collector. [[show-roots]] finds and prints out all the roots, with identifying
commentary. It should be possible to call [[(show-roots)]] at any time in any
Scheme program. Our [[show-roots]] is incomplete, but you may wish to complete
it if you have trouble. Here's an example that illustrates that [[s]] in
[[evalList]] is live, with value~6: 
<<*>>=
-> (+ (* 2 3) (begin (show-roots) 7))
ROOTS FOR GARBAGE COLLECTION:
  S-expression () -- permanent representation of '()
  S-expression T -- permanent representation of 'T
  Env -- permanent global environment
  Input -- the most recent input
  S-expression <primitive: +> -- function about to be applied in eval
  S-expression 6 -- temporary value s in evalList
  S-expression <primitive: show-roots> -- function about to be applied in eval
  Values () -- arguments of function about to be applied

13

-> 
@ At the time [[(show-roots)]] is evaluated, [[evalList]] has already been
called to evaluate [[(* 2 3)]], and the result of that evaluation ([[6]]) is
sitting in the local variable [[s]]. 

Here's a similar example: 
<<examples>>=
-> (set list3 (lambda (x y z) (cons x (cons y (cons z '())))))
<closure>

-> (list3 (+ 1 2) (* 1 2) (/ 1 (begin (show-roots) 2)))
ROOTS FOR GARBAGE COLLECTION:
  S-expression () -- permanent representation of '()
  S-expression T -- permanent representation of 'T
  Env -- permanent global environment
  Input -- the most recent input
  S-expression <closure> -- function about to be applied in eval
  S-expression 3 -- temporary value s in evalList
  S-expression 2 -- temporary value s in evalList
  S-expression <primitive: /> -- function about to be applied in eval
  S-expression 1 -- temporary value s in evalList
  S-expression <primitive: show-roots> -- function about to be applied in eval
  Values () -- arguments of function about to be applied

(3 2 0)

-> 
@ You can see some other examples that use [[show-roots]] online. To see what
[[show-roots]] is supposed to do in any particular function, you can run
\texttt{{\htmlTilde}cs152/bin/scheme-ms}, which has the [[show-roots]]
primitive. 
@ \item \textbf{Stage 3: Basic mark-and-sweep collection}. In this stage, you
will \textit{ implement a mark-and-sweep collector for S-expressions}. We have
broken this exercise into several steps. 

\begin{itemize}

\item Write a procedure [[gc]] that implements the mark phase of a
mark-and-sweep garbage collector.  At each collection, traverse the root set
and mark each reachable S-expression as [[live]]. This should be dead easy,
given that we've already provided the marking procedures. Your [[gc]]
procedure will call the appropriate marking procedure for each of the roots,
and it will reset [[hp]] and [[heap_limit]] to point back to the first arena.


Instrument your collector to gather statistics about memory usage and live
data, and have the [[gc]] procedure print out  statistics every time it is
called.  Such statistics should include the total number of S-expressions on
the heap, the number that are live at the given collection, and ratio of the
heap size to live data. (Note that we're talking about the size of our managed
heap, i.e., the collection of all arenas, not the real C~heap.  Our heap is
managed with [[allocSx]] and [[gc]].  The C~heap is managed with [[malloc]]
and [[free]].) Use the following format for your output: 

\begin{PRE}
[GC stats: heap size 3960, live data 3528, ratio 1.12]
\end{PRE}

Use exactly two digits after the decimal point to show the ratio. If there is
no live data, your ratio should print as \texttt{infinite}. 

Please also print some statistics about total memory usage. In the following
format, tell us the total number of cells allocated and the current size of
the heap: 

\begin{PRE}
[Mem stats: allocated 244911, heap size 3960, ratio 61.85]
\end{PRE}

Please print this information after every 10th garbage collection, and when
your interpreter exits. This will give you some intuition on how garbage
collection allows you to reuse memory; in the example above, I was able to run
in 60~times less memory than I could have without garbage collection. 

Finally, when your interpreter exits, print out the total number of
collections and the number of cells marked during those collections: 

\begin{PRE}
[Total GC work: 745 collections traced 1988260 cells]
\end{PRE}

In order to gather these statistics, you'll have to change your marking
procedures into functions that return the number of cells marked. 

\item Instead of having separate unmark and sweep phases, integrate the unmark
and sweep phases with [[allocSx]].  In particular:  

\begin{itemize}
\item Have [[gc]] reset the [[freeArena]] pointer to the beginning  of the
list of all arenas, and reset [[hp]] and [[heap_limit]] to  point to the
start and end of that arena.  
\item In [[allocSx]], Instead of just taking the cell pointed to  by~[[hp]],
check to see if  it is [[live]].  If so, skip past it (and mark it not live). 

\item If [[allocSx]] can't find a free cell in the current  [[freeArena]],
advance to the next arena.  
\end{itemize}

You may find it helpful to split [[allocSx]] into two functions:  one that
attempts to allocate without calling the garbage collector, but sometimes
fails, and another that may call the first function, the collector, and
[[growHeap]]. 

\item If [[allocSx]] gets all the way to the end of the last arena without
finding a free cell, it should call [[gc]] to try to create more space. 

\item As it stands, the system will loop forever  if the collector can't
reclaim any garbage.  Fix [[allocSx]] to call [[growHeap]] in that
circumstance. 

\item \textit{Prove} that in this scheme, the garbage collector is called only
when no S-expression is marked [[live]]. \textit{ 

\begin{itemize}
\item Hint \#1: begin with an invariant for the current arena pointed to by
[[freeArena]].  You'll have one property for [[freeArena->pool[i]]]
when [[freeArena->pool[i] < hp]], and a different invariant
for [[freeArena->pool[i]]] when [[freeArena->pool[i]
>= hp]]. Your [[growHeap]] procedure will have to establish this
invariant for every new arena.  
\item Hint \#2: extend the invariant to include arenas before and after the
current [[freeArena]].  The invariants for these two kinds of arenas should
look an awful lot like the invariants for [[freeArena->pool[i]]] where
[[freeArena->pool[i] < hp]] and [[freeArena->pool[i]
>= hp]].  
\item Hint \#3: show that [[growHeap]], [[allocSx]], and [[gc]] all maintain
this invariant.  
\item Hint \#4: show that conditions when [[gc]] is called, together with the
invariant, imply that [[gc]] is called only when no S-expression is marked
[[live]].  
\end{itemize}

} \textbf{You are likely to have an easier time with your implementation if
you work on the proof first.} A thorough understanding of the invariants
should make the implementation relatively easy. 
\end{itemize}

Remember that the purpose of the garbage collector is to limit the amount of
memory we have to request from the C~heap.  If you call [[malloc]] anywhere
but from within [[growHeap]], you are doing something wrong.  (But you don't
have to eliminate any existing calls to [[malloc]]; we've taken care of that
in stage~1.)  

To implement the Stage~3 collector, I had to add or change 196~lines of code,
of which 54~lines are devoted to debugging. 

\textit{Major debugging hints: If there is something wrong with your allocator
or collector, you will fail to mark some live cells, and programs will go
wrong at a later time when you try to re-use a cell that is already in use. A
good way to debug such problems is as follows: 

\begin{enumerate}
\item Call [[showRoots]] from your [[gc]] routine. 
\item Find a way of identifying each S-expression as garbage or non-garbage
(e.g., an extra field in [[struct markedsx]]). 
\item Sweep all S-expressions after each mark phase, noting that non-[[live]]
cells are garbage. 
\item Mark newly allocated cells as non-garbage.  
\item Write a procedure [[validate]] such that [[validate(s)]] halts with an
error meeesage when [[s]] is garbage, and returns [[s]] otherwise.  
\item Use [[validate]] on all arguments and results of [[eval]]. If you ever
find an S-expression that's garbage, your collector is broken. 
\end{enumerate}

This technique will help you discover GC problems, albeit at some cost in
performance. } 

\item \textbf{Stage 4: Garbage-collector performance}. If the amount of live
data is just slightly less than the number of S-expressions in all the arenas,
the system will have to garbage-collect after every few allocations in order
to keep re-using those few available cells.  This ``garbage-collector
thrashing'' can make it very difficult to get any useful work done. The
\textit{ratio of heap size to live data} is traditionally called [[Gamma]]. 

A good memory manager should control the size of the heap. Create a
[[targetGamma]] in your program so that after a collection, the arena list
holds enough arenas so that the ratio of heap size to live data is at least
[[targetGamma]]. That is, a [[targetGamma]] of 1.0 will make the collector
thrash, a [[targetGamma]] of 2.0 will offer twice as much heap as live data,
and so on.  

As you did in the closure tracing assignment, use a global variable
\texttt{target-gamma} in your Scheme interpreter so that you can control
[[targetGamma]] from within a Scheme program. Because the interpreters don't
have floating-point support, you'll have to use an integer, so let
\texttt{target-gamma} be 100~times [[targetGamma]], so that, for example,
executing [[(set target-gamma 175)]] will set [[TargetGamma]] to 1.75. The
initial value of [[target-gamma]] should be~100, so that your Stage~4
interpreter will behave just like your Stage~3 interpreter.  

\textit{Measure the amount of work done by the collector for different values
of [[targetGamma]]}.  Plot a graph showing \textit{work per allocation} as a
function of [[targetGamma]].  You may want to plot work vs measured [[Gamma]]
as well. [[{\htmlTilde}cs152/bin/jgraph]] is a good choice for plotting
graphs, or you can use [[gnuplot]] or do it by hand. 

To help with your measurements, we are providing code that will spit out test
cases using merge sort and insertion sort. If you call
\texttt{{\htmlTilde}cs152/bin/mergetest 18} it will print out a definition of
merge sort and a call to sort an 18-element array, so you can try 

\begin{PRE}
{\htmlTilde}cs152/bin/mergetest 56 {\htmlBar} ./a.out
\end{PRE}

for example, to try out your own interpreter. (And you can see that it works
with the regular Scheme interpreter.) A similar script called
\texttt{inserttest} lets you try the exact same test but using an insertion
sort. You will probably want to sort arrays of many different sizes as you
gather your measurements. 

\textit{Derive a formula} to express the cost of garbage collection as a
function of [[Gamma]].  The cost should be measured in \textit{GC work per
allocation}.  For a very simple approximation, assume a fixed percentage of
whatever is allocated becomes garbage by the next collection. \textit{How does
your formula compare with your measurements?} 

To implement Stage~4, I had to add or change only 9~lines of code in my
Stage~3 interpreter. 

If you are not able to get your own collector working, \textit{you can take
the Stage~4 measurements using my collector}, which you will find at
\texttt{{\htmlTilde}cs152/bin/scheme-ms}. (Naturally, I expect to get credit
in your README file :-) 

\item \textbf{Stage 5: Copying garbage collection}. In this stage, you will
\textit{convert the collector to a copying collector}. This will mean: 

\begin{itemize}
\item \textit{Do away with mark bits}. Instead you will need to add a new kind
of S-expression: [[FORWARDING_PTR]].  You will then allocate S-expressions
directly. 
\item \textit{Do away with the arenas}. The heap will become a single
contiguous space. Use the following variables to point into this space: 

\begin{tabular}{ll}
{[[from_space]]} & {The half of the heap where all the S-expressions are, and
in which you are currently allocating.  } \\
{[[to_space]]} & {The half of the heap which is currently unused.  } \\
{[[semispace_size]]} & {The number of cells in each semi-space.  } \\
{[[heap_limit]]} & {Equal to [[from_space + semispace_size]].  } \\
{[[hp]]} & {A pointer into from-space, such that every cell whose address is
less than [[hp]] is (at least potentially) in use, and every cell whose
address is at least [[hp]] is available for allocation. }

\end{tabular}

\item Your basic allocator will become simple again: 
<<example copying allocator>>=
Sx allocSx (void) 
{
    if (hp == heap_limit)
        gc();
    if (hp == heap_limit)
        growHeap();
    assert(hp < heap_limit);
    return hp++;
}
@ \item Instead of marking, your [[gc]] procedure will copy S-expressions from
from-space to to-space, then swap from-space and to-space. To manage the
allocation in to-space, you'll use the [[freeptr]] and [[scanptr]] discussed
in class. 

Most of the tracing procedures can be reused without modification. But the
tracing procedure for S-expressions will have to be replaced. Instead of 
<<old tracing procedure>>=
static void markSx(Sx s);
@ you will have 
<<new copying procedure>>=
static Sx forwardSx(Sx s) {
    if (to_space <= s && s <= freep)
        return s;    /* reference to an object already in to-space */
    else if (s->ty == FORWARDING_PTR) 
        return s->u.forwarding_ptr;
    else {
        *freeptr = *s;
        s->ty = FORWARDING_PTR;
        s->u.forwarding_ptr = freeptr;
        return freeptr++;
    }
}
@ and \textit{you will have to update values of type [[Sx]] that appear in data
structure and on the root stack}. Be particularly careful to update the roots;
your code should look something like this: 
<<example>>=
switch (root->kind)
{
    case SX: *(Sx *)(root->u.sx) = forwardSx(*(Sx *)(root->u.sx)); break;
    <<other cases as before>>
}
@ Note also in the code above, you \textit{do not forward a pointer that already
points to to-space}. This can't happen in a normal collector, but it can
happen in our collector because we may have objects that are not on the
managed heap (e.g., environments) which could contain pointers to to-space. 
\item When you enlarge the heap, you have to keep it contiguous. Because
there's no way to extend something contiguously using [[malloc]] and [[free]],
you will have to proceed as follows: 

\begin{enumerate}
\item Allocate a new, larger heap with [[malloc]]. 
\item Set up the to-space pointers to point into the new heap. 
\item Copy the live data from from-space (in the old heap) to to-space (in the
new heap). 
\item Get rid of the old heap, using [[free]]. 
\item Set up the from-space and to-space pointers suitably for the new heap. 
\end{enumerate}

When you grow your heap, always work with [[semispace_size]]. This way, when
you malloc something that holds [[2*semispace_size]] cells, you'll be
guaranteed to get two semi-spaces of the same size. 
\end{itemize}

When you grow the heap, \textit{be sure to do so in units of [[GROWTH_UNIT]]}
(i.e., be sure the total heap size is always a multiple of [[GROWTH_UNIT]]), so
as to facilitate comparisons with your mark-and-sweep collector. 

\item \textbf{Stage 6: Copying-collector performance}. Add [[target-gamma]] to
your Stage~5 collector, and repeat the measurements and the derivation you
implemented in Stage~4. Remember that [[Gamma]] is the ratio of the
\textit{total heap size} to the amount of live data, \textit{not} the ratio of
the size of a semi-space to the amount of live data. 
\end{enumerate}

We reserve the right to use different values of [[GROWTH_UNIT]] to exercise your
collector, so \textbf{you should be sure that your program works with any
positive value of [[GROWTH_UNIT]]}. 

It is quite difficult to test a garbage collector, so the bulk of your grade
will be based on your ability to convince us that you have implemented a
correct collector. Don't forget to explain what you have done, and don't leave
your explanation for the last minute! If you want to use \texttt{noweb} to
help explain what you are doing, you might want to start with the noweb source
for the stage 1 collector. 

\subsection*{Extra credit}

For \textit{massive extra credit}, you may solve any or all of these problems:


\begin{itemize}
\item \textbf{GENERATIONS}. Pick either your mark-and-sweep or your copying
collector, and make it generational. Re-do your plots of GC~work versus
[[Gamma]] and see what difference it makes. The Appel paper is fairly clear on
how to make a copying collector generational. To do the mark-and-sweep
collector, see the instructor or a TF. 
\item \textbf{CONSERVE}. All this root-tracking stinks, and these specialized
tracing functions aren't much fun either. Throw all that goo away, and fix
your mark-and-sweep collector so it finds roots and pointers conservatively.
Hints: 
\item You can find the bounds of the C stack by looking at the address of a
local variable in [[main]] and the address of a local variable in [[allocSx]].

\item You will have to put a wrapper around [[malloc]] so that you can keep
track of every pointer value and of the size of the object it points to. This
means not just S-expressions, but \textit{all} values. Take measurements to
see how much, if any, additional garbage your conservative collector retains. 

For bonus points, you can manage all memory using your conservative collector,
not just S-expressions. 
\end{itemize}

\subsection*{What to submit}

\textit{Submit files for all four stages, plus a write-up on paper}. Your
writeup should include: 

\begin{itemize}
\item Explanations of your implementations  
\item The proof required for Stage~3 (that the collector is called only when 
no S-expression is marked live).  
\item Your measurements (one hopes in the form of graphs)  for Stages 4 and 6.
 
\item The formulas you are to derive for Stages 4 and 6.  
\item An explanation of how well the formulas predict the measurements.  
\item Any conclusions you might care to draw about work per allocation and
comparisons of mark-and-sweep with copying. 
\end{itemize}

We expect you to explain the implementation of each of the phases separately.
Explain the ``whats'' and ``whys'' of your approach. Be concise and clear. 

If you are unable to complete the entire assignment, you can still get partial
credit for intermediate stages. 

If you wish, you may also turn in a file named \texttt{transcript} that
contains test cases for your solutions.