Bigtable merge compaction

March 27, 2017
Halligan 111A
Speaker: Neal Young, University of California, Riverside
Host: Robert Jacob


NoSQL databases based on log-structured merge trees are widely used for massive data storage and real-time web applications. Yet important aspects of these data structures are not well understood. For example, they write most of their data to a collection of files on disk, meanwhile periodically compacting subsets of these files. A compaction policy must choose which files to compact, and when to compact them, without knowing the future workload. These choices can affect computational costs by orders of magnitude, and existing compaction policies can be far from optimal. I’ll discuss recent work designing and analyzing provably good compaction policies for the Google Bigtable NoSQL database.

Bio: Professor Neal Young obtained his PhD in 1991 at Princeton University under Robert Tarjan. He taught at Dartmouth College and worked at Akamai Technologies in Cambridge before taking his current position at the University of California, Riverside, where he is a full professor. He studies approximation algorithms for combinatorial-optimization problems, including NP-hard problems, linear programming, and online problems such as Bigtable merge compaction.