What is consistent hashing?

Consistent hashing is a distributed hashing scheme that uses hash rings to assign keys to servers such that only a few keys need to be remapped when servers are added or removed. This enables caching and scaling with minimal turnover.

Hash rings map keys and servers around a circular space. Servers are assigned ranges based on their hashes. Keys are assigned to the nearest server clockwise from their hash location.

When servers are scaled in/out, only keys near the change points need rebalancing. Contrast with standard hashing where all keys remap randomly, causing mass cache turnover.

Consistent hashing requires balanced, randomized hashes. Algorithms like rendezvous hashing improve on basic consistent hashing. Fast non-crypto hashes like xxHash, MurmurHash and FNV are commonly used.

How does consistent hashing work?

With consistent hashing, servers and keys map to the same hash space. Keys are assigned to the nearest server clockwise on the ring.

When servers are added/removed, only keys near the change points get remapped. Most keys remain assigned to their original server.

Virtual replicas improve the distribution. Hashing algorithms like MD5, SHA-1 are used.

Why is consistent hashing important? Where is it used?

Consistent hashing enables scaling, load balancing and high availability for distributed data systems. Only minimal remapping is needed when the server pool changes.

Use cases include distributed caches, databases, cloud storage systems and content delivery networks like Akamai that require scalability.

FAQ

How does consistent hashing improve on basic hashing?

It minimizes key turnover when servers are added or removed compared to naive hash assignment. This is critical for distributed data systems.

What problem does consistent hashing solve?

It provides a hash distribution algorithm that reduces cache invalidation and rebalancing when scaling up servers and handling failures.

What are some examples of consistent hashing uses?

Distributed caching systems like memcached

NoSQL databases and datastores

Peer to peer networks and distributed ledgers

Load balancers

What are the main challenges with consistent hashing?

Non-uniform distributions and clustering

Rebalancing virtual points when nodes leave

Node capacity imbalance

Cryptographic attacks on hashes

References

[Book] Designing Data-Intensive Applications

[Article] Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

[Article] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

[Post] Sliding Window Hash Join: Efficiently Joining Infinite Streams with Order Preservation

[Post] General-purpose Stream Joins via Pruning Symmetric Hash Joins

Consistent Hashing

What is consistent hashing?

How does consistent hashing work?

Why is consistent hashing important? Where is it used?

FAQ

How does consistent hashing improve on basic hashing?

What problem does consistent hashing solve?

What are some examples of consistent hashing uses?

What are the main challenges with consistent hashing?

References

Related Topics

xxHash

MurmurHash

FNV Hash