Self Balancing Binary Tree

Prerequisites (click to view)

graph LR
	ALG(["fas:fa-trophy Self Balancing Binary Tree fas:fa-trophy #160;"])

	ASY_ANA(["fas:fa-check Asymptotic Analysis #160;"])
    click ASY_ANA "./math-asymptotic-analysis"

	BT(["fas:fa-check Binary Tree #160;"])
    click BT "./list-data-struct-binary-tree"

	ASY_ANA-->ALG
    BT-->ALG

The previous section on Binary Trees took a few liberties concerning asymptotic complexity. The purpose was to meter the amount of new concepts introduced; however, with the introductory concepts out of the way it’s time to correct the oversight. While most operations were advertised as $O(\log_{2}n)$, in reality, this is only true if the tree is perfectly balanced. A balanced tree has an equal number of children to the left and right of each node¹ as illustrated in the binary search tree of integers below. A more accurate measure of a tree’s asymptotic complexity is derived from the tree’s height.

Tree Balance

History

After the invention of binary trees, it didn’t take long for self-balancing trees to evolve. Two Russian mathematicians, Georgy Adelson-Velsky and Evgenii Landis, published a paper entitled An Algorithm for the Organization of Information in 1962² which outlined a technique for maintaining balance. The resulting data structure was named AVL tree after the initials of the inventors’ last names. AVL trees are still commonly used today.

Tree Height

A more accurate measure of a tree’s asymptotic complexity is derived from the tree’s height. The height of a tree is the number of nodes from the root to the most distant leaf. See the image below for a graphical reference.

Tree Height

Using the previously defined search operation, calculate the worst case number of comparisons required to locate a node in both trees shown above. Take a few minutes to come up with an answer before reading on.

Even though the balanced tree has three times more items, there are fewer requite comparisons (4 for the balanced tree and 5 for the unbalanced tree). The disparity compounds with more nodes. Stated more rigorously, the search operation has a runtime of $O(\text{height})$. Although this seems in opposition to the assertion that search is an $O(\log_{2}n)$ operation for a balanced search tree, consider the maximum height. Notice how the bottom row of nodes has exactly twice as many items as the row above it. This patterns continues all the way to the root. A number repeatedly divided by 2 is $\log_2$. Therefore, the height of a balanced search tree is $\log_2{n}$.

Conversely, the height of an completely unbalanced tree is $n$. $O(\text{height})$ works out to logarithmic ($O(\log_{2}n)$) for a balanced tree and linear ($O(n)$) for an unbalanced tree. As a reminder of the significant difference between the two complexity classes, see the comparison table below.

Input size $n$	$O(n)$ linear	$O(log_2{n})$ logarithmic
10	10	4
100	100	7
1000	1000	10
10000	10000	14
100000	100000	17
1000000	1000000	20
10000000	10000000	24

Perusal of the binary search tree pseudo code reveals that all the operations have a traversal mechanism similar to search. This means that all operations become linear for an unbalanced tree. As is hopefully clear by now, balance is a major concern with trees.

Maintaining Balance

Luckily, there are many well-defined tree data structures with built-in self-balance mechanisms. These are know as self balancing search trees. Examples of such trees include, but are not limited to:

AVL trees
Red-black trees
Splay trees
AA trees
B-trees
2-3 trees
2-3-4 trees
Scapegoat trees
Treaps
Weight-Balanced trees
etc…

Many of the examples listed above are simply enhanced binary trees that cleverly rearrange nodes in order to maintain balance. However, some are more exotic, such as B-trees which accommodate many keys per node. Each variation has slightly different runtime characteristics and utility. It’s not practical to cover all variations owing to the breadth of the topic. One could write a book solely on tree data structures. In the event that the reader has a performance critical application for trees³ it’s highly recommended that they delve deeper into this topic.

This begs the question, if there are trees capable of self balancing, why would anyone use a standard Binary Tree? The answer is that there is overhead associated with maintaining balance in the form of both complexity and clock cycles⁴. Self balancing ensures optimal tree traversals at the cost of less than optimal delete and insert operations⁵. Therefore, the ratio of insert/delete to traversal operations is a consideration. In cases where data is inserted in random order, statistically, a standard binary tree will be balanced regardless⁶. As with most cases in computer science, the particular circumstances determine the optimal solution. There is no one right answer.

The next obvious question is how exactly do self balancing trees maintain balance? While each implementation is unique, rotations are a prominent technique.

Rotations

Conceptually, a rotation is a procedure for rewriting node pointers so that a parent node trades places with one of it’s children in order to to reduce the overall height of the tree. A left rotation swaps the parent with its right sibling while a right rotation swaps the parent with its left sibling. Rotations are difficult to put in writing but easy to represent graphically. Consider the simplest possible rotation scenario: a tree with only three nodes as depicted below.

Simple Rotation

At a minimum, a rotation consists of rewriting two pointers. For instance, the left rotation shown above sets the rotate node’s right pointer to NULL and changes the right child’s left pointer to the address of the rotated node⁷. Of course, things get more complicated when there are child nodes to contend with. Consider a left rotation as illustrated below.

Left Rotation

Take a minute to examine the image. In order to rotate the root node (50) to the left, three things must happen:

The parent of the rotation node (50) becomes the parent of the right node (75). In this case, NULL.
The rotation node’s (50) right pointer is updated to the right node’s left pointer
The right node’s (75) left pointer changes to the rotation node (50)

A similar process occurs when rotating right. In this example, a branch node (75) is rotated. Consult the image below.

Right Rotation

The steps for right rotation are similar to that of the left rotation.

The parent (50) of the rotation node (75) becomes the parent of the left node (61).
The rotation node’s (75) left pointer is updated to the left node’s right pointer (NULL)
The left node’s (61) right pointer changes to the rotation node (75)

Self balancing isn’t entirely intuitive and it takes time to internalize the concept. The reader is highly encouraged to master rotations before advancing as it is assumed knowledge in the next section on red-black trees. As previously stated, a though treatment of self-balancing implementations is out of scope, However, red-black trees are a representative sample of self balancing implementations. They are popular and the concepts are easily translatable to other tree variations.

Exercises

Assuming you have a binary search tree with 1074 nodes, what is height given:
a. A self-balancing tree having items inserted in sequential order
b. A non-self-balancing tree having items inserted in random order
c. A non-self-balancing tree having items inserted in sequential order
Answers (click to expand)
1. The tree will remain balanced so the height is $\log_{2}1074 \approx10.06877$ which rounds up to $11$.
2. Inserting nodes in random order statitically produces a balanced tree. The height will be close to $\log_{2}1074 \approx10.06877$ which rounds up to $11$.
3. Inserting items sequentially with no balancing mechanism produces a tree where all the left or right pointers are NULL. The height is equal to the number of nodes, in this case 1074.
Calculate the height of a non-self-balancing binary search tree of integers given that nodes are inserted in the order specified below. The first node added to the tree becomes the root.
a. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
b. 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
c. 8, 4, 2, 1, 3, 6, 5, 7, 12, 10, 9, 11, 14, 13, 15
d. 12, 4, 1, 2, 3, 6, 5, 11, 7, 10, 8, 9, 15, 14, 13

Answers (click to expand)
Starting with the binary search tree of integers depicted in the image below, describe a single rotation operation that will reduce the tree’s height to 3.

Answers (click to expand)

Left rotate the 61 node.
Define pseudo code for both a left and right binary tree node rotation. Assume the functions accept an existing binary tree node and the binary tree is tracking parent pointers⁸⁹.

For formatting purposes, the answers for question 4 are located here.

More accurately, it’s not possible to reduce the height of a balanced tree. This concept is explained in the succeeding section. ↩
Available online at https://zhjwpku.com/assets/pdf/AED2-10-avl-paper.pdf ↩
For example, implementing a database. ↩
Recall the Principal of Parsimony ↩
Many self-balancing trees only add only constant time to insert and delete operations rending an identical asymptotic complexity. However, recall from Real World Asymptotic Complexity that constants do matter. Evaluate your use case carefully to determine if it matters for your particular context. ↩
Assuming a uniform distribution ↩
Depending on the implementation, parent nodes may also need to be rewired along with the pointer that specifies root. ↩
The purpose of parent pointers is explained on the Binary Tree page. This exercise is meant to reinforce that knowledge. ↩
Don’t fret if the answer isn’t entirely obvious; a neophyte can expect to invest upwards of three hours in a complete solution. This is by design. The only way to fully comprehend rotations is via experimentation and contemplation. A mountain of prose wont make the concept any more clear.

Many find it helpful to draw a visual representation of the operation before specifying pseudo code. Instead of starting from scratch, try adding parent pointers to the supplied drawings. ↩