Consistent Hashing

Algorithms/Data Structures
Updated on:
August 20, 2024

What is consistent hashing?

Consistent hashing is a distributed hashing scheme that uses hash rings to assign keys to servers such that only a few keys need to be remapped when servers are added or removed. This enables caching and scaling with minimal turnover.

Hash rings map keys and servers around a circular space. Servers are assigned ranges based on their hashes. Keys are assigned to the nearest server clockwise from their hash location.

When servers are scaled in/out, only keys near the change points need rebalancing. Contrast with standard hashing where all keys remap randomly, causing mass cache turnover.

Consistent hashing requires balanced, randomized hashes. Algorithms like rendezvous hashing improve on basic consistent hashing. Fast non-crypto hashes like xxHashMurmurHash and FNV are commonly used.

How does consistent hashing work?

With consistent hashing, servers and keys map to the same hash space. Keys are assigned to the nearest server clockwise on the ring.

When servers are added/removed, only keys near the change points get remapped. Most keys remain assigned to their original server.

Virtual replicas improve the distribution. Hashing algorithms like MD5, SHA-1 are used.

Why is consistent hashing important? Where is it used?

Consistent hashing enables scaling, load balancing and high availability for distributed data systems. Only minimal remapping is needed when the server pool changes.

Use cases include distributed caches, databases, cloud storage systems and content delivery networks like Akamai that require scalability.

FAQ

How does consistent hashing improve on basic hashing?

It minimizes key turnover when servers are added or removed compared to naive hash assignment. This is critical for distributed data systems.

What problem does consistent hashing solve?

It provides a hash distribution algorithm that reduces cache invalidation and rebalancing when scaling up servers and handling failures.

What are some examples of consistent hashing uses?

  • Distributed caching systems like memcached
  • NoSQL databases and datastores
  • Peer to peer networks and distributed ledgers
  • Load balancers

What are the main challenges with consistent hashing?

  • Non-uniform distributions and clustering
  • Rebalancing virtual points when nodes leave
  • Node capacity imbalance
  • Cryptographic attacks on hashes

References:

Related Entries

xxHash

xxHash is an extremely fast non-cryptographic hash algorithm focused on speed and efficiency for checksums and hash tables.

Read more ->
MurmurHash

MurmurHash is a series of fast non-cryptographic hash functions optimized for hash tables and CPU cache performance.

Read more ->
FNV Hash

The FNV hash is a fast, simple non-cryptographic hash function that uses modular arithmetic operations to achieve good distribution.

Read more ->

Get early access to AI-native data infrastructure