Bigtable merge compaction
Abstract
NoSQL databases based on log-structured merge trees are widely used
for massive data storage and real-time web applications. Yet important
aspects of these data structures are not well understood. For example,
they write most of their data to a collection of files on disk,
meanwhile periodically compacting subsets of these files. A
compaction policy must choose which files to compact, and when to
compact them, without knowing the future workload. These choices can
affect computational costs by orders of magnitude, and existing
compaction policies can be far from optimal. I’ll discuss recent work
designing and analyzing provably good compaction policies for the
Google Bigtable NoSQL database.
Bio: Professor Neal Young obtained his PhD in 1991 at Princeton
University under Robert Tarjan. He taught at Dartmouth College and
worked at Akamai Technologies in Cambridge before taking his current
position at the University of California, Riverside, where he is a
full professor. He studies approximation algorithms for
combinatorial-optimization problems, including NP-hard problems,
linear programming, and online problems such as Bigtable merge compaction.