Consistent Hashing
Consistent hashing is a fundamental technique for distributing data across distributed systems while minimizing data movement when nodes are added or removed. It's the backbone of modern distributed caches, databases, and load balancers.
The Problem: Why We Need Consistent Hashing
Traditional Hashing Approach
Given n cache nodes, a naive approach uses hash(key) % n to route requests:
node = hash(user_id) % n
Problem: When n changes (adding/removing nodes), almost all keys remap to different nodes:
- 10 nodes → add 1 node: ~91% of keys remap (
1 - 10/11) - 10 nodes → remove 1 node: ~10% of keys permanently lost
- Database/cache hit: massive simultaneous cache miss avalanche
This is catastrophic for distributed systems because:
- Hot data becomes cold simultaneously
- Origin databases get overwhelmed by cache refilling
- System latency spikes during resharding
- Data temporarily unavailable during migration
Consistent Hashing Solution
Consistent hashing ensures:
- Only K/n keys remap when a node is added (K = total keys, n = nodes)
- Only K/n keys remap when a node is removed
- All other keys stay on their existing nodes
- Smooth rebalancing without massive disruption
How Consistent Hashing Works
Core Concept: Hash Ring
Instead of hashing to a linear range [0, n-1], consistent hashing maps both data keys and nodes to a large circular address space (typically 0 to 2^32-1 or 2^64-1).
Routing Algorithm
To find which node stores a key:
- Hash the key:
position = hash(key) - Move clockwise around the ring
- First node encountered is the responsible node
Adding a Node
When adding a new node:
- Hash the node identifier to find its position
- The node takes ownership of keys in the range from its position clockwise to the next node
- Only those keys move (typically
1/nof total keys)
Removing a Node
When a node fails or is removed:
- The node's key range becomes orphaned
- The next clockwise node inherits the range
- Only keys from the failed node remap
Virtual Nodes: Solving Non-Uniform Distribution
The Problem
Basic consistent hashing has issues:
- Non-uniform distribution: With few nodes, data distribution is uneven
- Hot spots: A single popular node gets disproportionate traffic
- Uneven capacity: Different nodes have different hardware capacity
Solution: Virtual Nodes (VNodes)
Each physical node is represented by multiple virtual nodes on the ring:
- Physical node A → Virtual nodes A1, A2, A3, ... A100
- Each virtual node gets its own position on the ring
- More virtual nodes = better distribution, more metadata overhead
Benefits of Virtual Nodes
Better Load Distribution:
- 3 physical nodes with 100 vnodes each → 300 distribution points
- Reduces variance from ±50% to ±5%
- Approaches uniform distribution asymptotically
Heterogeneous Capacity:
- Powerful node: 200 virtual nodes
- Smaller node: 100 virtual nodes
- Larger nodes handle proportionally more data
Failure Domain Isolation:
- Each physical node's data spread across the ring
- One failure distributes load across multiple survivors
- Better load distribution during partial outages
Easier Rebalancing:
- Adding node: add its virtual nodes incrementally
- Each virtual node takes a small slice
- Gradual data movement, not bulk migration
Mathematical Analysis
Key Distribution
With N physical nodes and V virtual nodes per physical node:
- Total virtual nodes:
N × V - Each node expects
1/Nof keys - Standard deviation of load distribution:
σ ≈ 1/√(N × V)
Example:
- 10 nodes, 100 vnodes each → 1000 total vnodes
- Expected keys per node: 10%
- Standard deviation: ~3.2%
Data Movement on Topology Change
Adding one node:
- Old node count:
n - New node count:
n+1 - Keys moved: approximately
K/(n+1)where K = total keys - Percentage moved:
1/(n+1)
Removing one node:
- Keys moved: approximately
K/n - Percentage moved:
1/n
Example with 1 billion keys:
- Traditional hashing (10→11 nodes): ~91% moved (~910M keys)
- Consistent hashing (10→11 nodes): ~9% moved (~90M keys)
Hash Function Selection
Requirements
Good Distribution:
- Uniform distribution across hash space
- Minimal collisions
- Avalanche effect (small input change → large output change)
Performance:
- Fast computation (critical for hot path)
- Minimal CPU overhead
- Cache-friendly access patterns
Determinism:
- Same input always produces same hash
- Critical for ring stability
Common Choices
| Hash Function | Bits | Speed | Quality | Notes |
|---|---|---|---|---|
| MurmurHash3 | 128 | Fast | Excellent | Non-cryptographic, widely used |
| xxHash | 64 | Very Fast | Very Good | CPU-optimized, popular |
| CityHash | 64/128 | Fast | Very Good | Google's hash, optimized for strings |
| SHA-1 | 160 | Slow | Excellent | Overkill for non-crypto use |
| SHA-256 | 256 | Slower | Excellent | Cryptographic, slower |
Recommendation: MurmurHash3 or xxHash for most distributed systems.
Hash Ring Size
Common choices:
- 32-bit ring (0 to 2^32-1): ~4 billion positions, sufficient for most systems
- 64-bit ring (0 to 2^64-1): Virtually infinite collisions, used at massive scale
- SHA-1 (160-bit): Used by Amazon Dynamo, treated as circular
Larger ring = lower collision probability but more memory for metadata.