The Union-Find data structure (aka Disjoint-Set) maintains sets of objects.
There are only three operations:
create a new set.
union merges two sets into one and
find returns the set
for which an object is a member. Sets are tree1 data structures. Each
object in a set maintains a pointer to it’s parent. The root of the tree’s
parent pointer is self referential. This is best demonstrated graphically.
The picture below is a representation of a union-find data structure after 6
make set operations (
make set A; make set B; ...). Notice how each set has a
pointer that points back to itself. At this point, every object is root.
Executing a union operation on
C deletes set-3 and merges
A is still a root but now
C’s parent pointer now points to
Executing a union operation on
D deletes set-4 and merges
Executing a union operations on
F deletes set-6 and merges
Finally, executing a union operation on
A merges set-5 into set-1
leaving only two remaining sets.
find operation returns the set for which an object is contained. Consider
the image above.
find F will return Set-1. Staring at
F, it follows the
pointers until it reaches
A which has a self-referential loop. This indicates
that it has found its containing set.
There are two techniques that significantly improve the performance of union-find data structures: union by rank and path compression.
Union by rank ensures that when two sets are merged, each leaf of the tree will
have the shortest possible path to the root. Shorter paths mean fewer
traversals for each
find operation. See the image below.
The general idea is to maintain a rank value for each set. Lower ranks are always consumed into sets with higher ranks. See the pseudo code for more details.
Path compression takes advantage of the
find operation. Each find operation
requires a path traversal to the root. It takes no extra time to update each
node to point directly to the root thereby compressing the path. This is
depicted graphically below.
- Make Set: $O(1)$
- Union: $O(\log n)$
- Find: $O(\log n)$
The figures above, though accurate, in the most technical sense, are deceiving.
While it’s true that a single
find operation may take $O(\log n)$,
it’s actually closer to $O(1)$. Remember that each
find operation reduces the
run time to $O(1)$ for all future
find operations for all nodes in the path.
Assume that $n$ is the number of
make set operations and $m$ to be the total
find operations combined2. The
amortized asymptotic complexity works out to $O(m \alpha(n))$ Where $\alpha$ is
the inverse Ackerman
function. The particulars
of the function are not important. The salient concept is that the function
grows very slowly. It will not exceed $4$ for values of $n$ as high as
make_set: x = input object x->parent = x x->rank = 0 find: x = input object if x->parent != x: x->parent = find(x) // path compression return x->parent union: x = input object 1 y = input object 2 set1 = find(x) set2 = find(y) if set1 == set2: return if set1->rank > set2->rank set2->parent = set1 else set1->parent = set2 if set1->rank == set2->rank increment set2->rank
Click here for build and run instructions