Building a Bloom Filter with Fewer Hash Functions

February 2, 2005

2:50 pm - 4:00 pm

Halligan 111

Speaker: Michael Mitzenmacher, Harvard University

Abstract

A Bloom filter is a simple space-efficient randomized data structure for representing a set that supports membership queries; in exchange for the space savings, there is some probability of a false positive. Bloom filters have proven very useful for various database and network applications.

Bloom filters generally use k hash functions for some constant k; typical values of k range from 4 to 10. We show (both mathematically and empirically) that one can use 2 hash function to mimic k hash functions while maintaining the same probability of a false positive for a Bloom filter. This yields implementations that are more efficient and may require less randomness.

We describe Bloom filters, the mathematics behind them, and some of their many applications, and then explain our improved construction.