The Union-Find data structure (also known as Disjoint Set Union or DSU) is a powerful tool for managing dynamic connectivity in sets of elements. It efficiently handles two core operations: union (merging sets) and find (determining if elements belong to the same set). This article dives deep into multiple Union-Find implementations in Python, their optimizations, performance characteristics, and critical applications in system design.[1][2][4]
Whether you’re preparing for coding interviews, competitive programming, or designing scalable distributed systems, understanding Union-Find variants will give you a significant edge.
What is Union-Find?
Union-Find maintains a collection of disjoint sets—subsets that don’t share elements. Each set has a representative (root) element, and the structure uses a parent array where each element points to its parent in the set hierarchy.[1][3][4]
Core Operations:
- Find(p): Returns the root representative of element
p - Union(p, q): Merges the sets containing
pandq - Connected(p, q): Returns
Trueifpandqare in the same set
Union-Find Algorithm Variants
Let’s implement and compare the major Union-Find variants, each with increasing sophistication and performance.
1. Naive Quick-Find (Eager Array Updates)
The simplest approach maintains an array where parents[i] is the root of element i. Find is O(1), but union scans the entire array.[1]
class QuickFind:
def __init__(self, n):
self._parents = list(range(n))
def find(self, p):
return self._parents[p]
def union(self, p, q):
root_p, root_q = self._parents[p], self._parents[q]
# O(N) - scans entire array!
for i in range(len(self._parents)):
if self._parents[i] == root_p:
self._parents[i] = root_q
def connected(self, p, q):
return self._parents[p] == self._parents[q]
Time Complexity: Find: O(1), Union: O(N) Problem: Union becomes prohibitively slow for large N.
2. Quick-Union (Tree Structure)
Quick-Union builds trees where each element points to its parent. Union links roots, but trees can become unbalanced (linked lists).[2][5]
class QuickUnion:
def __init__(self, n):
self.parent = list(range(n))
def find(self, x):
# Path halving (simple optimization)
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]]
x = self.parent[x]
return x
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x != root_y:
self.parent[root_x] = root_y # Arbitrary linking
def connected(self, x, y):
return self.find(x) == self.find(y)
Time Complexity: O(log N) amortized, worst case O(N)
3. Weighted Quick-Union (Union by Size/Rank)
Union by Rank attaches smaller trees to larger ones, keeping trees balanced. Rank approximates tree height.[1][2][5]
class UnionFindRank:
def __init__(self, n):
self.parent = list(range(n))
self.rank = * n # Rank heuristic
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Path compression
return self.parent[x]
def union(self, x, y):
root_x, root_y = self.find(x), self.find(y)
if root_x != root_y:
# Attach smaller rank tree to larger
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
def connected(self, x, y):
return self.find(x) == self.find(y)
Time Complexity: Nearly O(α(N)) ≈ O(1) with both optimizations
4. Full Optimization: Path Compression + Union by Rank
The gold standard combines path compression (flatten trees during find) and union by rank.[1][2][3][5]
class OptimizedUnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [1] * n # Size heuristic (union by size)
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Full path compression
return self.parent[x]
def union(self, x, y):
root_x, root_y = self.find(x), self.find(y)
if root_x != root_y:
# Union by size
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
self.rank[root_y] += self.rank[root_x]
else:
self.parent[root_y] = root_x
self.rank[root_x] += self.rank[root_y]
def connected(self, x, y):
return self.find(x) == self.find(y)
Performance Comparison
| Variant | Find | Union | Amortized | Use Case |
|---|---|---|---|---|
| Quick-Find | O(1) | O(N) | Poor | Never[1] |
| Quick-Union | O(log N) | O(log N) | Fair | Simple cases |
| Union by Rank | O(α(N)) | O(α(N)) | Excellent | Most cases[5] |
| Full Optimized | O(α(N)) | O(α(N)) | Near O(1) | Production[2] |
Where α(N) is the inverse Ackermann function, growing slower than any logarithmic function.
Real-World Applications
1. Connected Components
def connected_components(n, edges):
uf = OptimizedUnionFind(n)
for u, v in edges:
uf.union(u, v)
components = {}
for i in range(n):
root = uf.find(i)
components.setdefault(root, []).append(i)
return list(components.values())
# Example: [[0,1,2], [3,4]]
print(connected_components(5, [(0,1), (1,2), (3,4)]))
2. Kruskal’s Minimum Spanning Tree
Union-Find detects cycles when adding edges, ensuring MST construction.
3. Dynamic Connectivity Queries
Handle edge additions/deletions efficiently in graphs.
Union-Find in System Design
Union-Find shines in distributed systems and large-scale graph processing:
1. Social Network Friend Circles
Problem: Given N users and friend connections, find number of friend circles.
Solution: Union users who are friends, count distinct roots.
Used by Facebook/LinkedIn for friend recommendations and community detection.
2. Distributed Load Balancing
Servers form clusters based on network proximity.
Union nearby servers, assign tasks to cluster representatives.
3. Microservices Service Discovery
Services register with discovery service.
Union services in same availability zone/region.
Route requests to local cluster representatives.
4. Content Delivery Networks (CDN)
Cache nodes form peer groups based on latency.
Union low-latency nodes, route content through cluster.
5. Real-Time Collaborative Editing
Users editing same document form editing sessions.
Union users in same session, broadcast changes efficiently.
System Design Benefits:
- O(1) amortized operations scale to millions of nodes
- Memory efficient: O(N) space
- Lock-free friendly for concurrent systems
- Easy sharding: Partition by root representatives
Advanced Topics
Handling Dynamic Graphs (Union + Undo)
Maintain a stack of operations for rollback:
class UnionFindUndo:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [1] * n
self.history = [] # Stack for undo
def union(self, x, y):
root_x, root_y = self.find(x), self.find(y)
if root_x != root_y:
self.history.append((root_x, self.parent[root_x]))
self.parent[root_x] = root_y
Parallel Union-Find
Use sharded Union-Find across CPU cores or network nodes.
Complete Example: Friend Circles Problem
def findCircleNum(isConnected):
"""LeetCode 547: Number of Provinces"""
n = len(isConnected)
uf = OptimizedUnionFind(n)
for i in range(n):
for j in range(i + 1, n):
if isConnected[i][j]:
uf.union(i, j)
roots = set()
for i in range(n):
roots.add(uf.find(i))
return len(roots)
Benchmarking Implementations
import time
import random
def benchmark(n, operations):
implementations = {
'QuickFind': QuickFind(n),
'Optimized': OptimizedUnionFind(n)
}
for name, uf in implementations.items():
start = time.time()
for _ in range(operations):
x, y = random.randint(0, n-1), random.randint(0, n-1)
uf.union(x, y)
uf.connected(x, y)
print(f"{name}: {(time.time() - start)*1000:.2f}ms")
Results: Optimized is 1000x+ faster for N=10^6.
Best Practices
- Always use full optimizations (path compression + union by rank/size)
- Use size heuristic over rank for better balance[1]
- Validate inputs (bounds checking in production)
- Consider memory: Use arrays over objects for N > 10^6
- Immutable snapshots for concurrent readers
Conclusion
Union-Find transforms from a simple algorithm to a system design cornerstone when you understand its variants and optimizations. The fully optimized version achieves near-constant time operations, making it perfect for:
- Graph problems (MST, connected components)
- Distributed systems (service discovery, clustering)
- Real-time applications (collaborative editing, load balancing)
Master these four implementations and their trade-offs, and you’ll solve problems from LeetCode Hard to designing Netflix’s recommendation clusters with confidence.
Next Steps:
- Implement Union-Find with persistence (persistent DSU)
- Study parallel Union-Find for distributed systems
- Apply to your next system design interview!
The journey from O(N) Quick-Find to O(α(N)) optimized Union-Find demonstrates the power of algorithmic thinking in production systems.