The purposes of this exercise are
Your GC study is divided into five stages. We have implemented the first two stages. You should be able to implement each of the final three stages separately. Make sure you understand the entire stage before beginning work on it. The code you need to get started is in ~cs152/asst/gc.
If you have trouble with one of the stages, seek help. Don't just go on to the next stage and assume things will work out.
Each arena is an array holding some constant number of
S-expressions, with mark bits, e.g.:
<
We have
implemented a function [[allocSx]], which takes an S-expression (that is, a
free cell) from the
current free arena.
If there are no S-expressions remaining in the current arena, it
calls [[growHeap]] to add a new arena to the tail of the list of
all arenas.
To solve this stage, we have broken the problem down by type.
The functions that call functions that might allocate are
The code we supply you already includes the proper
calls to [[pushRoot]] and [[popRoots]].
In stage 3, you'll have to trace pointers starting at all of the roots
listed above.
To help ensure that all roots are visible, we have implemented a new
primitive, [[show-roots]], whose only purpose is to help debug the
garbage collector.
[[show-roots]] finds and prints out all the roots, with identifying commentary.
It should be possible to call [[(show-roots)]] at any time in any Scheme
program.
Our [[show-roots]] is incomplete, but you may wish to complete it if
you have trouble.
Here's an example that illustrates that [[s]] in [[evalList]] is live,
with value 6:
<<*>>=
-> (+ (* 2 3) (begin (show-roots) 7))
ROOTS FOR GARBAGE COLLECTION:
S-expression () -- permanent representation of '()
S-expression T -- permanent representation of 'T
Env -- permanent global environment
Input -- the most recent input
S-expression
Here's a similar example:
<
You can see some
other examples that use [[show-roots]] online.
To see what [[show-roots]] is supposed to do in any particular function, you can run
~cs152/bin/scheme-ms, which has the [[show-roots]] primitive.
@
Instrument your collector to gather statistics about memory usage
and live data, and have the [[gc]] procedure print out
statistics every time it is called. Such statistics should include
the total number of S-expressions on the heap, the number that are
live at the given collection, and ratio of the heap size to
live data.
(Note that we're talking about the size of our managed heap, i.e.,
the collection of all arenas, not the real C heap. Our heap is
managed with [[allocSx]] and [[gc]]. The C heap is managed
with [[malloc]] and [[free]].)
Use the following format for your output:
Please also print some statistics about total memory usage.
In the following format, tell us the total number of cells allocated and
the current size of the heap:
Finally, when your interpreter exits, print out the total number of
collections and the number of cells marked during those collections:
In order to gather these statistics, you'll have to change your
marking procedures into functions that return the number of cells
marked.
Remember that the purpose of the garbage collector is to limit the
amount of memory we have to request from the C heap. If you
call [[malloc]] anywhere but from within [[growHeap]], you are doing
something wrong. (But you don't have to eliminate any existing calls
to [[malloc]]; we've taken care of that in stage 1.)
To implement the Stage 3 collector, I had to add or change
196 lines of code, of which
54 lines are devoted to debugging.
Major debugging hints:
If there is something wrong with your allocator or collector,
you will fail to mark some live cells, and
programs will go wrong at a later time when you try to re-use a cell
that is already in use.
A good way to debug such problems is as follows:
A good memory manager should control the size of the heap.
Create a [[targetGamma]] in your program so that after a
collection, the arena list holds enough arenas so that the ratio of
heap size to live data is at least [[targetGamma]].
That is, a [[targetGamma]] of 1.0 will make the collector thrash, a
[[targetGamma]] of 2.0 will offer twice as much heap as live data, and so
on.
As you did in the closure tracing assignment, use a global variable
target-gamma in your Scheme interpreter so that you can
control [[targetGamma]] from within a Scheme program.
Because the interpreters don't have floating-point support, you'll
have to use an integer, so let target-gamma be
100 times [[targetGamma]], so that, for example,
executing [[(set target-gamma 175)]] will set
[[TargetGamma]] to 1.75.
The initial value of [[target-gamma]] should be 100, so that
your Stage 4 interpreter will behave just like your Stage 3 interpreter.
Measure the amount of work done by the collector for different
values of [[targetGamma]]. Plot a graph showing work per allocation
as a function of [[targetGamma]]. You may want to plot work vs
measured [[Gamma]] as well.
[[~cs152/bin/jgraph]]
is a good choice for plotting graphs, or you can use [[gnuplot]] or
do it by hand.
To help with your measurements, we are providing code that will spit
out test cases using merge sort and insertion sort.
If you call Derive a formula to express the cost of garbage collection as a
function of [[Gamma]]. The cost should be measured in GC work
per allocation. For a very simple approximation, assume a fixed
percentage of whatever is allocated becomes garbage by the next
collection.
How does your formula compare with your measurements?
To implement Stage 4, I had to add or change only 9 lines
of code in my Stage 3 interpreter.
If you are not able to get your own collector working,
you can take the Stage 4 measurements using my
collector, which you will find at
~cs152/bin/scheme-ms.
(Naturally, I expect to get credit in your README file :-)
Most of the tracing procedures can be reused without modification.
But the tracing procedure for S-expressions will have to be replaced.
Instead of
<
Dark shaded areas are used, while light areas are available for allocation.
Except for the current free arena, arenas are either entirely used or
entirely available.
[[heap_limit - hp]] gives the size of the light area in the current arena.
Of course, this picture doesn't make sense for the stage 1 we have
implemented---in the implementation we give you, the current arena is
also always the last one.
Also, once we have a garbage collector, ``used'' and ``available''
will be
only approximations to what the dark and light areas represent.
And then we can get pointers indirectly through values of the following types:
For each of these types, we have written a tracing function that
follows pointers and marks S-exprssions.
readInput, parseExp,
parseEL, expToInput, expToAst,
explistToAstlist, getBindings, expToSx,
explistToSx, mkSYM, mkPRIM,
main, eval, evalList,
applyClosure, applyValueOp, applyArithOp,
mkCLO, mkNUM, mkLIST, and mkSx.
In each of these functions, we must do work to ensure that every
potential root is visible to the garbage collector.
To make the job easier, we have used the following precondition:
When any of these functions is called, it is guaranteed that its
arguments are reachable through some root already on the root stack.
Using this precondition simplifies things, and without it, we would
find ourselves putting multiple copies of the same [[rho]] on the root stack
over and over again.
[GC stats: heap size 3960, live data 3528, ratio 1.12]
Use exactly two digits after the decimal point to show the ratio.
If there is no live data, your ratio should print as infinite
.
[Mem stats: allocated 244911, heap size 3960, ratio 61.85]
Please print this information after every 10th garbage collection,
and when your interpreter exits.
This will give you some intuition on how garbage collection allows
you to reuse memory; in the example above, I was able to run in
60 times less memory than I could have without garbage collection.
[Total GC work: 745 collections traced 1988260 cells]
You may find it helpful to split [[allocSx]] into two functions:
one that attempts to allocate without calling the garbage collector,
but sometimes fails, and another that may call the first function, the
collector, and [[growHeap]].
You are likely to have an easier time with your implementation if you
work on the proof first.
A thorough understanding of the invariants should make the
implementation relatively easy.
This technique will help you discover GC problems, albeit at some cost
in performance.
~cs152/bin/mergetest 18
it will print
out a definition of merge sort and a call to sort an 18-element array,
so you can try
~cs152/bin/mergetest 56 | ./a.out
for example, to try out your own interpreter.
(And you can see that it works with the regular Scheme interpreter.)
A similar script called inserttest
lets you try the exact
same test but using an insertion sort.
You will probably want to sort arrays of many different sizes as you
gather your measurements.
When you grow the heap, be sure to do so in units of
[[ARENASIZE]] (i.e., be sure the total heap size is always a
multiple of [[ARENASIZE]]), so as to facilitate comparisons with your
mark-and-sweep collector.
[[from_space]] The half of the heap where all the S-expressions are, and in which you
are currently allocating. [[to_space]] The half of the heap which is currently unused. [[semispace_size]] The number of cells in each semi-space. [[heap_limit]] Equal to [[from_space + semispace_size]]. [[hp]] A pointer into from-space, such that every cell whose address is less
than [[hp]] is (at least potentially) in
use, and every cell whose address is at least [[hp]] is available for
allocation.
When you grow your heap, always work with [[semispace_size]].
This way, when you malloc something that holds [[2*semispace_size]] cells, you'll be
guaranteed to get two semi-spaces of the same size.
It is quite difficult to test a garbage collector, so the bulk of your grade will be based on your ability to convince us that you have implemented a correct collector. Don't forget to explain what you have done, and don't leave your explanation for the last minute! If you want to use noweb to help explain what you are doing, you might want to start with the noweb source for the stage 1 collector.
For bonus points, you can manage all memory using your conservative collector, not just S-expressions.
If you are unable to complete the entire assignment, you can still get partial credit for intermediate stages.
If you wish, you may also turn in a file named transcript that contains test cases for your solutions.