What is a bloom filter?

A Bloom filter is a probabilistic data structure designed to support membership queries on a set in a space-efficient manner allowing false positives but no false negatives.

It can indicate if an element is definitely not in a set, or probably in a set. Bloom filters have very compact representations compared to storing the full set explicitly.

Bloom filters rely on hash functions and interval arithmetic to map set elements to array bits efficiently. They are used in systems like databases, networks and browsers for scalable set representation. Related structures like B-trees, distributed hash tables, and skip lists have comparable tradeoffs.

How does it work?

A Bloom filter consists of a bit vector initialized to 0s and uses multiple hash functions to map elements to random index positions. To add an item, the bits at the hashed indexes are flipped to 1. To check membership, you check if the bits are set at all the hashed positions.

Why is it important? Where is it used?

Bloom filters enable space-efficient representation of sets and fast membership checks with adjustable error tradeoffs. Uses include network routing, databases, caching layers, data streams and anywhere space matters.

The compact probabilistic representation enables scaling massive sets with small memory overhead. The tolerance for false positives provides the efficiency gains.

FAQ

What are the key properties of a Bloom filter?

Space efficiency - Uses compact bit vectors

O(1) add and lookup time complexity

Zero false negatives; adjustable false positive rate

No need to store full set explicitly

What are the disadvantages of Bloom filters?

False positives are possible

No exact count of members

Rebuilding filter needed if full set changes

Cryptographic applications require more advanced constructions

What parameters can be tuned in a Bloom filter?

Key parameters:

Vector size

Hash functions used

Number of hash functions

What are common applications of Bloom filters?

Databases: Join filtering, deduplication

Networking: Routers, CDNs

Security: spellchecking, attack detection

Genome assembly

Distributed caches

References

[Book] Handbook of Data Structures and Applications, 2nd Edition

[Article] Compressed bloom filters

[Post] Bloom Filters

[Post] Running Windowing Queries in Stream Processing

[Post] Probabilistic Data Structures in Streaming: Count-Min Sketch

Bloom Filter

What is a bloom filter?

How does it work?

Why is it important? Where is it used?

FAQ

What are the key properties of a Bloom filter?

What are the disadvantages of Bloom filters?

What parameters can be tuned in a Bloom filter?

What are common applications of Bloom filters?

References

Related Topics

Count Min Sketch

Distributed Hash Table

Skip List

B-tree