The UnionFind data structure (aka DisjointSet) maintains sets of objects.
There are only three operations: make set
, union
, and find
. make set
create a new set. union
merges two sets into one and find
returns the set
for which an object is a member. Sets are tree^{1} data structures. Each
object in a set maintains a pointer to it’s parent. The root of the tree’s
parent pointer is self referential. This is best demonstrated graphically.
The picture below is a representation of a unionfind data structure after 6
make set
operations (make set A; make set B; ...
). Notice how each set has a
pointer that points back to itself. At this point, every object is root.
Executing a union operation on A
and C
deletes set3 and merges C
into
set1. A
is still a root but now C
’s parent pointer now points to A
.
Executing a union operation on A
and D
deletes set4 and merges D
into
set1.
Executing a union operations on E
and F
deletes set6 and merges F
into
set5.
Finally, executing a union operation on E
and A
merges set5 into set1
leaving only two remaining sets.
The find
operation returns the set for which an object is contained. Consider
the image above. find F
will return Set1. Staring at F
, it follows the
pointers until it reaches A
which has a selfreferential loop. This indicates
that it has found its containing set.
Improving Performance
There are two techniques that significantly improve the performance of unionfind data structures: union by rank and path compression.
Union by rank ensures that when two sets are merged, each leaf of the tree will
have the shortest possible path to the root. Shorter paths mean fewer
traversals for each find
operation. See the image below.
The general idea is to maintain a rank value for each set. Lower ranks are always consumed into sets with higher ranks. See the pseudo code for more details.
Path compression takes advantage of the find
operation. Each find operation
requires a path traversal to the root. It takes no extra time to update each
node to point directly to the root thereby compressing the path. This is
depicted graphically below.
Applications
 Kruskal’s Minimum Spanning Tree
 Detecting cycles in a graph
Asymptotic Complexity
 Make Set: $O(1)$
 Union: $O(\log n)$
 Find: $O(\log n)$
The figures above, though accurate, in the most technical sense, are deceiving.
While it’s true that a single union
or find
operation may take $O(\log n)$,
it’s actually closer to $O(1)$. Remember that each find
operation reduces the
run time to $O(1)$ for all future find
operations for all nodes in the path.
Assume that $n$ is the number of make set
operations and $m$ to be the total
number of make set
, union
, and find
operations combined^{2}. The
amortized asymptotic complexity works out to $O(m \alpha(n))$ Where $\alpha$ is
the inverse Ackerman
function. The particulars
of the function are not important. The salient concept is that the function
grows very slowly. It will not exceed $4$ for values of $n$ as high as
$10^{600}$.
Pseudo Code
make_set:
x = input object
x>parent = x
x>rank = 0
find:
x = input object
if x>parent != x:
x>parent = find(x) // path compression
return x>parent
union:
x = input object 1
y = input object 2
set1 = find(x)
set2 = find(y)
if set1 == set2:
return
if set1>rank > set2>rank
set2>parent = set1
else
set1>parent = set2
if set1>rank == set2>rank
increment set2>rank
Source Code
Relevant Files:
Click here for build and run instructions

A tree is a connected graph that contains no cycles. ↩

The maximum number of
union
operations that can be executed on any unionfind data structure is $n1$. At that point, there is only a single data set left. ↩