Self Balancing Binary Tree

The previous section on Binary Trees took a few liberties concerning asymptotic complexity. The purpose was to meter the amount of new concepts introduced; however, with the introductory concepts out of the way it’s time to correct the oversight. While most operations were advertised as $O(\log_{2}n)$, in reality, this is only true if the tree is perfectly balanced. A balanced tree has the same number of children on either side of each node. A more accurate measure of a tree’s asymptotic complexity is derived from the tree’s height. The height of a tree is the number of nodes from the root to the most distant leaf. See the image below for a graphical reference.

Tree Height

Tree Height

Almost every tree operation requires a tree traversal (see pseudo code below). Consider how many steps it takes to traverse the trees in the image above. The unbalanced tree will require five steps. Even though the balanced tree has three times more nodes, the maximum number of steps to reach any node is four. This phenomenon compounds quickly as more nodes are inserted. It is more accurate to replace all the $O(\log_{2}n)$ run times in the previous section with $O(height)$ because traversing a completely unbalanced tree is a linear ($O(n)$) operation.

traverse:
    Node = root node of binary tree data structure
    Item = item to search for

    if Node is NULL:
        return NOT FOUND

    if Node->Item is equal to Item:
        return Node

    if Node->Item is greater than Item:
        traverse(Node->Right, Item)

    // If it's not equal or greater, it must be less
    traverse(Node->Left, Item)

As is hopefully clear by now, balance is a major concern with trees. Luckily, there are many tree data structures designed to automatically maintain balance. These are know as self balancing search trees. Examples of such trees include, but are not limited to:

  • Red-black trees
  • Splay trees
  • AA trees
  • AVL trees
  • B-trees
  • 2-3 trees
  • 2-3-4 trees
  • Scapegoat trees
  • Treaps
  • Weight-Balanced trees
  • etc…

Many of the examples listed above are simply enhanced binary trees. However, some are more exotic, such as B-trees which accommodate many keys per node. Each variation has slightly different runtime characteristics and utility.

This begs the question, if there are trees capable of self balancing, why would anyone use a standard Binary Tree? The answer is that there is overhead associated with maintaining balance in the form of both complexity and clock cycles (see the Principal of Parsimony). Self balancing ensures optimal tree traversals at the cost of less than optimal update and insert operations. Therefore, the ratio of insert/delete to traversal operations is a consideration. In cases where data is inserted in random order, statistically, a standard binary tree will be balanced regardless. As with most cases in computer science, the particular circumstances of individual scenarios determine the optimal solution. There is no one right answer.

One could write a book solely on tree data structures. In the event that the reader has an exceedingly performance critical application for trees (perhaps implementing a database), it’s highly recommended that they delve deeper into this topic. For the majority of applications, the differences are trivial. For brevity, this source only examines Red-Black trees. Red-Black trees are popular and the concepts are easily translatable to other tree variations.