\subsection*{Garbage collection} Software due Friday, October 22, at 12:10AM (a.k.a. Thursday night at midnight).\htmlBR Write-ups due \textit{on paper} Friday, October 22, at 5:10PM in Maxwell Dworkin 231. The purposes of this exercise are \begin{itemize} \item To review the implementation of mark-and-sweep garbage collection, which you have seen in CS~51. \item To understand how to bound mark-and-sweep collection times by interleaving the sweep phase with the allocator. \item To understand how to implement a simple copying collector. \item To study and understand the effects of heap size on performance in both mark-and-sweep and copying collectors. \end{itemize} This is a long assignment and will count 100~points: double the value of a typical assignment. Before you begin work, \begin{itemize} \item Read and understand the heap interface from the Stage~1 collector. \item Read through the entire assignment. \end{itemize} Your GC study is divided into five stages. We have implemented the first two stages. You should be able to implement each of the final three stages separately. Make sure you understand the entire stage before beginning work on it. The code you need to get started is in \texttt{{\htmlTilde}cs152/asst/gc}. If you have trouble with one of the stages, \textbf{seek help}. Don't just go on to the next stage and assume things will work out. \begin{enumerate} \item \textbf{Stage 1: Arenas}. In stage 1, we observe that [[mkSx]] is the only function that allocates S-expressions with [[malloc]]. We have changed that [[malloc]] to [[allocSx]]. [[allocSx]] uses a linked list of ``arenas.'' For our mark-and-sweep collector, the arena is the unit in which the memory-management code gets memory from the operating system. The collection of all arenas will constitute the ``managed heap.'' Each arena is an array holding some constant number of S-expressions, with mark bits, e.g.: <>= #define GROWTH_UNIT 24 struct markedsx { /* S-expression with mark bit */ struct sx sx; unsigned live:1; }; typedef struct arena { struct markedsx pool[GROWTH_UNIT]; struct arena *tail; } @ Notice that arenas are chained on a linked list; the head of that list is called [[heap]]. One arena is always the current free arena, and the variable [[freeArena]] points to it. The heap pointer [[hp]] points to next available S-expression in the current free arena. The limit pointer [[heap_limit]] points just beyond the last S-expression in the current free arena, so at any moment the number of free S-expressions in the current free arena is [[heap_limit - hp]]. <>= struct markedsx *hp, *heap_limit; struct arena *heap, *freeArena; @ Here's a picture of what the arenas look like in the middle of an allocation cycle: \htmlBR [IMAGE]\htmlBR Dark shaded areas are used, while light areas are available for allocation. Except for the current free arena, arenas are either entirely used or entirely available. [[heap_limit - hp]] gives the size of the light area in the current arena. Of course, this picture doesn't make sense for the stage 1 we have implemented---in the implementation we give you, the current arena is also always the last one. Also, once we have a garbage collector, ``used'' and ``available'' will be only approximations to what the dark and light areas represent. We have implemented a function [[allocSx]], which takes an S-expression (that is, a free cell) from the current free arena. If there are no S-expressions remaining in the current arena, it calls [[growHeap]] to add a new arena to the tail of the list of all arenas. \item \textbf{Stage 2: Roots}. The next step in preparing a collector is to identify the roots: all the places where pointers might lead to live S-expressions. Another way to say it is we have to know what [[Sx]]s might possibly be used after a call to [[allocSx]]. These certainly include every [[Sx]] reachable from every [[rho]] of every invocation of [[eval]], as well as various temporary values and environments. Some S-expressions are also live during parsing. To solve this stage, we have broken the problem down by type. \begin{itemize} \item First, we have identified all the types that might contain a pointer leading to a value of type [[SEXP]]. Any of the following types can contain pointers to an [[SEXP]]: \begin{itemize} \item [[Sx]], through the [[car]] and [[cdr]] fields of [[u.list]]. \item [[Valuelist]], because it is a list of [[Sx]]. \end{itemize} And then we can get pointers indirectly through values of the following types: \begin{itemize} \item [[Env]], because it contains a [[Valuelist]]. \item [[Ast]], through the [[VAL]] field. \item [[Astlist]], as a list of [[Ast]]s. \item [[Sx]] again, when it is a closure, which holds both [[Ast]] and [[Env]]. \item [[Exp]], through the [[sx]] field. \item [[Explist]], as a list of [[Exp]]s. \item [[Input]], as it may contain [[Ast]]. \end{itemize} For each of these types, we have written a tracing function that follows pointers and marks S-exprssions. \item The next step is identifying the root set. Any value of any of the types named above is potentially a root if it could be live during a traced-state allocation. The set of roots therefore includes: \begin{enumerate} \item [[globalEnv]] \item [[ipt]] in [[main]] \item potentially, local variables and actual parameters of any function that calls a function that could allocate. (A function could allocate either if it calls [[allocSx]], or if it calls a function that could allocate.) These include: \item every [[rho]] of every activitation of [[eval]] \item every [[s]] in every activation of [[evalList]] ([[s]] is the \textit{result} of a previous [[eval]] that's being saved for later use. To understand the details, look at the source for [[evalList]].) \item a great many roots in parsing and converting [[Exp]] to [[Ast]], culminating in the value returned by [[readInput]] \end{enumerate} The functions that call functions that might allocate are \begin{quote} \texttt{readInput}, \texttt{parseExp}, \texttt{parseEL}, \texttt{expToInput}, \texttt{expToAst}, \texttt{explistToAstlist}, \texttt{getBindings}, \texttt{expToSx}, \texttt{explistToSx}, \texttt{mkSYM}, \texttt{mkPRIM}, \texttt{main}, \texttt{eval}, \texttt{evalList}, \texttt{applyClosure}, \texttt{applyValueOp}, \texttt{applyArithOp}, \texttt{mkCLO}, \texttt{mkNUM}, \texttt{mkLIST}, and \texttt{mkSx}. \end{quote} In each of these functions, we must do work to ensure that every potential root is visible to the garbage collector. To make the job easier, we have used the following precondition: \begin{quote} \textit{When any of these functions is called, it is guaranteed that its arguments are reachable through some root already on the root stack. } We call a value \textit{traced} if it is reachable through some root on the stack. We can therefore state the requirements as follows: \textit{any of these functions may assume that its arguments are traced}, and \textit{the caller of any of these functions must guaranteed that its arguments are traced}. \end{quote} Using this precondition simplifies things, and without it, we would find ourselves putting multiple copies of the same [[rho]] on the root stack over and over again. The code we supply you already includes the proper calls to [[pushRoot]] and [[popRoots]]. \end{itemize} In stage 3, you'll have to trace pointers starting at all of the roots listed above. To help ensure that all roots are visible, we have implemented a new primitive, [[show-roots]], whose only purpose is to help debug the garbage collector. [[show-roots]] finds and prints out all the roots, with identifying commentary. It should be possible to call [[(show-roots)]] at any time in any Scheme program. Our [[show-roots]] is incomplete, but you may wish to complete it if you have trouble. Here's an example that illustrates that [[s]] in [[evalList]] is live, with value~6: <<*>>= -> (+ (* 2 3) (begin (show-roots) 7)) ROOTS FOR GARBAGE COLLECTION: S-expression () -- permanent representation of '() S-expression T -- permanent representation of 'T Env -- permanent global environment Input -- the most recent input S-expression -- function about to be applied in eval S-expression 6 -- temporary value s in evalList S-expression -- function about to be applied in eval Values () -- arguments of function about to be applied 13 -> @ At the time [[(show-roots)]] is evaluated, [[evalList]] has already been called to evaluate [[(* 2 3)]], and the result of that evaluation ([[6]]) is sitting in the local variable [[s]]. Here's a similar example: <>= -> (set list3 (lambda (x y z) (cons x (cons y (cons z '()))))) -> (list3 (+ 1 2) (* 1 2) (/ 1 (begin (show-roots) 2))) ROOTS FOR GARBAGE COLLECTION: S-expression () -- permanent representation of '() S-expression T -- permanent representation of 'T Env -- permanent global environment Input -- the most recent input S-expression -- function about to be applied in eval S-expression 3 -- temporary value s in evalList S-expression 2 -- temporary value s in evalList S-expression -- function about to be applied in eval S-expression 1 -- temporary value s in evalList S-expression -- function about to be applied in eval Values () -- arguments of function about to be applied (3 2 0) -> @ You can see some other examples that use [[show-roots]] online. To see what [[show-roots]] is supposed to do in any particular function, you can run \texttt{{\htmlTilde}cs152/bin/scheme-ms}, which has the [[show-roots]] primitive. @ \item \textbf{Stage 3: Basic mark-and-sweep collection}. In this stage, you will \textit{ implement a mark-and-sweep collector for S-expressions}. We have broken this exercise into several steps. \begin{itemize} \item Write a procedure [[gc]] that implements the mark phase of a mark-and-sweep garbage collector. At each collection, traverse the root set and mark each reachable S-expression as [[live]]. This should be dead easy, given that we've already provided the marking procedures. Your [[gc]] procedure will call the appropriate marking procedure for each of the roots, and it will reset [[hp]] and [[heap_limit]] to point back to the first arena. Instrument your collector to gather statistics about memory usage and live data, and have the [[gc]] procedure print out statistics every time it is called. Such statistics should include the total number of S-expressions on the heap, the number that are live at the given collection, and ratio of the heap size to live data. (Note that we're talking about the size of our managed heap, i.e., the collection of all arenas, not the real C~heap. Our heap is managed with [[allocSx]] and [[gc]]. The C~heap is managed with [[malloc]] and [[free]].) Use the following format for your output: \begin{PRE} [GC stats: heap size 3960, live data 3528, ratio 1.12] \end{PRE} Use exactly two digits after the decimal point to show the ratio. If there is no live data, your ratio should print as \texttt{infinite}. Please also print some statistics about total memory usage. In the following format, tell us the total number of cells allocated and the current size of the heap: \begin{PRE} [Mem stats: allocated 244911, heap size 3960, ratio 61.85] \end{PRE} Please print this information after every 10th garbage collection, and when your interpreter exits. This will give you some intuition on how garbage collection allows you to reuse memory; in the example above, I was able to run in 60~times less memory than I could have without garbage collection. Finally, when your interpreter exits, print out the total number of collections and the number of cells marked during those collections: \begin{PRE} [Total GC work: 745 collections traced 1988260 cells] \end{PRE} In order to gather these statistics, you'll have to change your marking procedures into functions that return the number of cells marked. \item Instead of having separate unmark and sweep phases, integrate the unmark and sweep phases with [[allocSx]]. In particular: \begin{itemize} \item Have [[gc]] reset the [[freeArena]] pointer to the beginning of the list of all arenas, and reset [[hp]] and [[heap_limit]] to point to the start and end of that arena. \item In [[allocSx]], Instead of just taking the cell pointed to by~[[hp]], check to see if it is [[live]]. If so, skip past it (and mark it not live). \item If [[allocSx]] can't find a free cell in the current [[freeArena]], advance to the next arena. \end{itemize} You may find it helpful to split [[allocSx]] into two functions: one that attempts to allocate without calling the garbage collector, but sometimes fails, and another that may call the first function, the collector, and [[growHeap]]. \item If [[allocSx]] gets all the way to the end of the last arena without finding a free cell, it should call [[gc]] to try to create more space. \item As it stands, the system will loop forever if the collector can't reclaim any garbage. Fix [[allocSx]] to call [[growHeap]] in that circumstance. \item \textit{Prove} that in this scheme, the garbage collector is called only when no S-expression is marked [[live]]. \textit{ \begin{itemize} \item Hint \#1: begin with an invariant for the current arena pointed to by [[freeArena]]. You'll have one property for [[freeArena->pool[i]]] when [[freeArena->pool[i] < hp]], and a different invariant for [[freeArena->pool[i]]] when [[freeArena->pool[i] >= hp]]. Your [[growHeap]] procedure will have to establish this invariant for every new arena. \item Hint \#2: extend the invariant to include arenas before and after the current [[freeArena]]. The invariants for these two kinds of arenas should look an awful lot like the invariants for [[freeArena->pool[i]]] where [[freeArena->pool[i] < hp]] and [[freeArena->pool[i] >= hp]]. \item Hint \#3: show that [[growHeap]], [[allocSx]], and [[gc]] all maintain this invariant. \item Hint \#4: show that conditions when [[gc]] is called, together with the invariant, imply that [[gc]] is called only when no S-expression is marked [[live]]. \end{itemize} } \textbf{You are likely to have an easier time with your implementation if you work on the proof first.} A thorough understanding of the invariants should make the implementation relatively easy. \end{itemize} Remember that the purpose of the garbage collector is to limit the amount of memory we have to request from the C~heap. If you call [[malloc]] anywhere but from within [[growHeap]], you are doing something wrong. (But you don't have to eliminate any existing calls to [[malloc]]; we've taken care of that in stage~1.) To implement the Stage~3 collector, I had to add or change 196~lines of code, of which 54~lines are devoted to debugging. \textit{Major debugging hints: If there is something wrong with your allocator or collector, you will fail to mark some live cells, and programs will go wrong at a later time when you try to re-use a cell that is already in use. A good way to debug such problems is as follows: \begin{enumerate} \item Call [[showRoots]] from your [[gc]] routine. \item Find a way of identifying each S-expression as garbage or non-garbage (e.g., an extra field in [[struct markedsx]]). \item Sweep all S-expressions after each mark phase, noting that non-[[live]] cells are garbage. \item Mark newly allocated cells as non-garbage. \item Write a procedure [[validate]] such that [[validate(s)]] halts with an error meeesage when [[s]] is garbage, and returns [[s]] otherwise. \item Use [[validate]] on all arguments and results of [[eval]]. If you ever find an S-expression that's garbage, your collector is broken. \end{enumerate} This technique will help you discover GC problems, albeit at some cost in performance. } \item \textbf{Stage 4: Garbage-collector performance}. If the amount of live data is just slightly less than the number of S-expressions in all the arenas, the system will have to garbage-collect after every few allocations in order to keep re-using those few available cells. This ``garbage-collector thrashing'' can make it very difficult to get any useful work done. The \textit{ratio of heap size to live data} is traditionally called [[Gamma]]. A good memory manager should control the size of the heap. Create a [[targetGamma]] in your program so that after a collection, the arena list holds enough arenas so that the ratio of heap size to live data is at least [[targetGamma]]. That is, a [[targetGamma]] of 1.0 will make the collector thrash, a [[targetGamma]] of 2.0 will offer twice as much heap as live data, and so on. As you did in the closure tracing assignment, use a global variable \texttt{target-gamma} in your Scheme interpreter so that you can control [[targetGamma]] from within a Scheme program. Because the interpreters don't have floating-point support, you'll have to use an integer, so let \texttt{target-gamma} be 100~times [[targetGamma]], so that, for example, executing [[(set target-gamma 175)]] will set [[TargetGamma]] to 1.75. The initial value of [[target-gamma]] should be~100, so that your Stage~4 interpreter will behave just like your Stage~3 interpreter. \textit{Measure the amount of work done by the collector for different values of [[targetGamma]]}. Plot a graph showing \textit{work per allocation} as a function of [[targetGamma]]. You may want to plot work vs measured [[Gamma]] as well. [[{\htmlTilde}cs152/bin/jgraph]] is a good choice for plotting graphs, or you can use [[gnuplot]] or do it by hand. To help with your measurements, we are providing code that will spit out test cases using merge sort and insertion sort. If you call \texttt{{\htmlTilde}cs152/bin/mergetest 18} it will print out a definition of merge sort and a call to sort an 18-element array, so you can try \begin{PRE} {\htmlTilde}cs152/bin/mergetest 56 {\htmlBar} ./a.out \end{PRE} for example, to try out your own interpreter. (And you can see that it works with the regular Scheme interpreter.) A similar script called \texttt{inserttest} lets you try the exact same test but using an insertion sort. You will probably want to sort arrays of many different sizes as you gather your measurements. \textit{Derive a formula} to express the cost of garbage collection as a function of [[Gamma]]. The cost should be measured in \textit{GC work per allocation}. For a very simple approximation, assume a fixed percentage of whatever is allocated becomes garbage by the next collection. \textit{How does your formula compare with your measurements?} To implement Stage~4, I had to add or change only 9~lines of code in my Stage~3 interpreter. If you are not able to get your own collector working, \textit{you can take the Stage~4 measurements using my collector}, which you will find at \texttt{{\htmlTilde}cs152/bin/scheme-ms}. (Naturally, I expect to get credit in your README file :-) \item \textbf{Stage 5: Copying garbage collection}. In this stage, you will \textit{convert the collector to a copying collector}. This will mean: \begin{itemize} \item \textit{Do away with mark bits}. Instead you will need to add a new kind of S-expression: [[FORWARDING_PTR]]. You will then allocate S-expressions directly. \item \textit{Do away with the arenas}. The heap will become a single contiguous space. Use the following variables to point into this space: \begin{tabular}{ll} {[[from_space]]} & {The half of the heap where all the S-expressions are, and in which you are currently allocating. } \\ {[[to_space]]} & {The half of the heap which is currently unused. } \\ {[[semispace_size]]} & {The number of cells in each semi-space. } \\ {[[heap_limit]]} & {Equal to [[from_space + semispace_size]]. } \\ {[[hp]]} & {A pointer into from-space, such that every cell whose address is less than [[hp]] is (at least potentially) in use, and every cell whose address is at least [[hp]] is available for allocation. } \end{tabular} \item Your basic allocator will become simple again: <>= Sx allocSx (void) { if (hp == heap_limit) gc(); if (hp == heap_limit) growHeap(); assert(hp < heap_limit); return hp++; } @ \item Instead of marking, your [[gc]] procedure will copy S-expressions from from-space to to-space, then swap from-space and to-space. To manage the allocation in to-space, you'll use the [[freeptr]] and [[scanptr]] discussed in class. Most of the tracing procedures can be reused without modification. But the tracing procedure for S-expressions will have to be replaced. Instead of <>= static void markSx(Sx s); @ you will have <>= static Sx forwardSx(Sx s) { if (to_space <= s && s <= freep) return s; /* reference to an object already in to-space */ else if (s->ty == FORWARDING_PTR) return s->u.forwarding_ptr; else { *freeptr = *s; s->ty = FORWARDING_PTR; s->u.forwarding_ptr = freeptr; return freeptr++; } } @ and \textit{you will have to update values of type [[Sx]] that appear in data structure and on the root stack}. Be particularly careful to update the roots; your code should look something like this: <>= switch (root->kind) { case SX: *(Sx *)(root->u.sx) = forwardSx(*(Sx *)(root->u.sx)); break; <> } @ Note also in the code above, you \textit{do not forward a pointer that already points to to-space}. This can't happen in a normal collector, but it can happen in our collector because we may have objects that are not on the managed heap (e.g., environments) which could contain pointers to to-space. \item When you enlarge the heap, you have to keep it contiguous. Because there's no way to extend something contiguously using [[malloc]] and [[free]], you will have to proceed as follows: \begin{enumerate} \item Allocate a new, larger heap with [[malloc]]. \item Set up the to-space pointers to point into the new heap. \item Copy the live data from from-space (in the old heap) to to-space (in the new heap). \item Get rid of the old heap, using [[free]]. \item Set up the from-space and to-space pointers suitably for the new heap. \end{enumerate} When you grow your heap, always work with [[semispace_size]]. This way, when you malloc something that holds [[2*semispace_size]] cells, you'll be guaranteed to get two semi-spaces of the same size. \end{itemize} When you grow the heap, \textit{be sure to do so in units of [[GROWTH_UNIT]]} (i.e., be sure the total heap size is always a multiple of [[GROWTH_UNIT]]), so as to facilitate comparisons with your mark-and-sweep collector. \item \textbf{Stage 6: Copying-collector performance}. Add [[target-gamma]] to your Stage~5 collector, and repeat the measurements and the derivation you implemented in Stage~4. Remember that [[Gamma]] is the ratio of the \textit{total heap size} to the amount of live data, \textit{not} the ratio of the size of a semi-space to the amount of live data. \end{enumerate} We reserve the right to use different values of [[GROWTH_UNIT]] to exercise your collector, so \textbf{you should be sure that your program works with any positive value of [[GROWTH_UNIT]]}. It is quite difficult to test a garbage collector, so the bulk of your grade will be based on your ability to convince us that you have implemented a correct collector. Don't forget to explain what you have done, and don't leave your explanation for the last minute! If you want to use \texttt{noweb} to help explain what you are doing, you might want to start with the noweb source for the stage 1 collector. \subsection*{Extra credit} For \textit{massive extra credit}, you may solve any or all of these problems: \begin{itemize} \item \textbf{GENERATIONS}. Pick either your mark-and-sweep or your copying collector, and make it generational. Re-do your plots of GC~work versus [[Gamma]] and see what difference it makes. The Appel paper is fairly clear on how to make a copying collector generational. To do the mark-and-sweep collector, see the instructor or a TF. \item \textbf{CONSERVE}. All this root-tracking stinks, and these specialized tracing functions aren't much fun either. Throw all that goo away, and fix your mark-and-sweep collector so it finds roots and pointers conservatively. Hints: \item You can find the bounds of the C stack by looking at the address of a local variable in [[main]] and the address of a local variable in [[allocSx]]. \item You will have to put a wrapper around [[malloc]] so that you can keep track of every pointer value and of the size of the object it points to. This means not just S-expressions, but \textit{all} values. Take measurements to see how much, if any, additional garbage your conservative collector retains. For bonus points, you can manage all memory using your conservative collector, not just S-expressions. \end{itemize} \subsection*{What to submit} \textit{Submit files for all four stages, plus a write-up on paper}. Your writeup should include: \begin{itemize} \item Explanations of your implementations \item The proof required for Stage~3 (that the collector is called only when no S-expression is marked live). \item Your measurements (one hopes in the form of graphs) for Stages 4 and 6. \item The formulas you are to derive for Stages 4 and 6. \item An explanation of how well the formulas predict the measurements. \item Any conclusions you might care to draw about work per allocation and comparisons of mark-and-sweep with copying. \end{itemize} We expect you to explain the implementation of each of the phases separately. Explain the ``whats'' and ``whys'' of your approach. Be concise and clear. If you are unable to complete the entire assignment, you can still get partial credit for intermediate stages. If you wish, you may also turn in a file named \texttt{transcript} that contains test cases for your solutions.