Bigtable merge compaction
NoSQL databases based on log-structured merge trees are widely used
for massive data storage and real-time web applications. Yet important
aspects of these data structures are not well understood. For example,
they write most of their data to a collection of files on disk,
meanwhile periodically compacting subsets of these files. A
compaction policy must choose which files to compact, and when to
compact them, without knowing the future workload. These choices can
affect computational costs by orders of magnitude, and existing
compaction policies can be far from optimal. I’ll discuss recent work
designing and analyzing provably good compaction policies for the
Google Bigtable NoSQL database.
Bio: Professor Neal Young obtained his PhD in 1991 at Princeton University under Robert Tarjan. He taught at Dartmouth College and worked at Akamai Technologies in Cambridge before taking his current position at the University of California, Riverside, where he is a full professor. He studies approximation algorithms for combinatorial-optimization problems, including NP-hard problems, linear programming, and online problems such as Bigtable merge compaction.