Using Object Lifetime and Call Stack Profile Data To Improve Garbage Collection Performance
Garbage collectors in runtime systems such as Java and .NET typically use heap fullness as an indicator of when to trigger a garbage collection (GC), collecting only when the heap becomes full. With modern collectors, time spent in GC is proportional to the number of live objects on the heap. Because the choice of collection time does not take into account the number of live objects on the heap, this collection may occur when there are a large number of live objects on the heap, resulting in a large amount of time spent in collection.
If the garbage collector could predict the number of live objects on the heap during runtime, it could collect before the heap becomes full, at a time when there are few live objects on the heap. Less total time would be spent in GC, and each GC might also be shorter.
Matthew Hertz designed and implemented the Merlin algorithm to determine the exact death times of objects in the open-source Java virtual machine JikesRVM. We augment his object lifetime tracing system by adding data from the runtime system to log method entry and exit. Now, in addition to the exact death times of all objects in the system, we also know the call stack when any object died. Using this data, we can define an optimal GC strategy and a performance metric: total number of live objects across all collections.
We propose to improve garbage collection performance by profiling programs using this tracing system. We plan to use the data, along with machine learning techniques, to determine the best time to collect. We hypothesize an improvement in garbage collection performance versus the standard "wait-until-heap-is-full" strategy.