Union-Find (Disjoint Set)

The Union-Find data structure (aka Disjoint-Set) maintains sets of objects. There are only three operations: make set, union, and find. make set create a new set. union merges two sets into one and find returns the set for which an object is a member. Sets are tree1 data structures. Each object in a set maintains a pointer to it’s parent. The root of the tree’s parent pointer is self referential. This is best demonstrated graphically.

The picture below is a representation of a union-find data structure after 6 make set operations (make set A; make set B; ...). Notice how each set has a pointer that points back to itself. At this point, every object is root.

Make Set

Executing a union operation on A and C deletes set-3 and merges C into set-1. A is still a root but now C’s parent pointer now points to A.

Make Set

Executing a union operation on A and D deletes set-4 and merges D into set-1.

Make Set

Executing a union operations on E and F deletes set-6 and merges F into set-5.

Make Set

Finally, executing a union operation on E and A merges set-5 into set-1 leaving only two remaining sets.

Make Set

The find operation returns the set for which an object is contained. Consider the image above. find F will return Set-1. Staring at F, it follows the pointers until it reaches A which has a self-referential loop. This indicates that it has found its containing set.

Improving Performance

There are two techniques that significantly improve the performance of union-find data structures: union by rank and path compression.

Union by rank ensures that when two sets are merged, each leaf of the tree will have the shortest possible path to the root. Shorter paths mean fewer traversals for each find operation. See the image below.


The general idea is to maintain a rank value for each set. Lower ranks are always consumed into sets with higher ranks. See the pseudo code for more details.

Path compression takes advantage of the find operation. Each find operation requires a path traversal to the root. It takes no extra time to update each node to point directly to the root thereby compressing the path. This is depicted graphically below.



Asymptotic Complexity

  • Make Set: $O(1)$
  • Union: $O(\log n)$
  • Find: $O(\log n)$

The figures above, though accurate, in the most technical sense, are deceiving. While it’s true that a single union or find operation may take $O(\log n)$, it’s actually closer to $O(1)$. Remember that each find operation reduces the run time to $O(1)$ for all future find operations for all nodes in the path.

Assume that $n$ is the number of make set operations and $m$ to be the total number of make set, union, and find operations combined2. The amortized asymptotic complexity works out to $O(m \alpha(n))$ Where $\alpha$ is the inverse Ackerman function. The particulars of the function are not important. The salient concept is that the function grows very slowly. It will not exceed $4$ for values of $n$ as high as $10^{600}$.

Pseudo Code

    x = input object

    x->parent = x
    x->rank = 0

    x = input object

    if x->parent != x:
        x->parent = find(x) // path compression

   return x->parent

    x = input object 1
    y = input object 2

    set1 = find(x)
    set2 = find(y)

    if set1 == set2:

    if set1->rank > set2->rank
        set2->parent = set1
        set1->parent = set2

        if set1->rank == set2->rank
            increment set2->rank 

Source Code

Full Repo

Relevant Files:

Click here for build and run instructions

  1. A tree is a connected graph that contains no cycles

  2. The maximum number of union operations that can be executed on any union-find data structure is $n-1$. At that point, there is only a single data set left.