Binary Tree

Just like arrays and linked lists, binary trees are another list data structure (technically, a graph data structure but graph concepts aren’t germane to this topic). They are the most complex list data structure, so the reader is highly encouraged to thoroughly examine the provided pseudo and source code.

As the name implies, binary trees arrange data in a tree like structure with a root, branches, and leaves. Each node in a binary tree has a left and right pointer. The left pointer points to a node with a lesser valued item and the right pointer points to a node with a greater valued item. Furthermore, every descendant to the left of a given node has a lesser value and likewise every descendant to the right has a greater value. The root node can be any node in the structure and acts as the entry point. The image below depicts a typical binary tree.

Binary Tree

Binary Tree

An interesting property of binary trees is that there are multiple valid ways to rearrange data. For instance, consider the binary trees depicted below. Both trees are valid, and contain the same data.

Same Data, Different Valid Arrangements

Binary Tree Data Arrangement

Binary trees provide all the same utility as Sorted Arrays with the added bonus of performant insert and deletes. The table below compares the asymptotic complexity for operations on the two types of data structures. As is shown, sorted arrays do have slightly faster run times for some operations. However, if insert and delete is a requirement binary trees are always a better option.

Operation Sorted Array Binary Tree
Insert\Delete $O(n)$1 $O(\log_{2}n)$
Search $O(\log_{2}n)$ $O(\log_{2}n)$
Enumerate $O(n)$ $O(n+\log_{2}n)$
Min\Max $O(1)$ $O(\log_{2}n)$
Predecessor\Successor $O(\log_{2}n)$ $O(\log_{2}n)$
Select $O(1)$ $O(\log_{2}n)$
Rank $O(\log_{2}n)$ $O(\log_{2}n)$

As a side note, a more appropriate name may be inverted tree because they look like an upside down tree. Alas, we are bound by convention. Remember the words of Bertrand Russell, “Conventional people are roused to fury by departure from convention, largely because they regard such departure as a criticism of themselves.”

Applications

Asymptotic Complexity

  • Insert\Delete: $O(\log_{2}n)$
  • Search: $O(\log_{2}n)$
  • Enumerate: $O(n + \log_{2}n)$
  • Min: $O(\log_{2}n)$
  • Max: $O(\log_{2}n)$
  • Predecessor: $O(\log_{2}n)$
  • Successor: $O(\log_{2}n)$
  • Select: $O(\log_{2}n)$
  • Rank: $O(\log_{2}n)$

Pseudo Code

insert:
    node: tree node to start from
    item: item to insert

    if item is less than node:
        if node->left is NULL:
            node->left = item
            return
        else
            insert(node->left, item)
    else
        if node->right is NULL:
            node->right = item
            return
        else
            InsertNode(node->right, item)

delete:
    node: tree node to start from
    item: item to insert - item will have exactly 0, 1, or 2 children

    if item has 0 children:
        delete pointer to item in parent node
        delete item

    if item has 1 child:
        update item->parent to point to child instead of item
        update child->parent to item->parent
        delete item

    // item has 2 children
    largest_left = largest item in the left subtree
    delete largest_left from tree
    replace item with largest_left

search:
    node: tree node to start from
    item: item to insert - item will have exactly 0, 1, or 2 children

    if item equal node:
        return node

    if item is less than node:
        search(node->left, item)

    // item must be greater than b/c it's not equal or less
    search(node->right, item)

enumerate:
    node: tree node to start from

    enumerate(node->left)
    output node
    enumerate(node->right)

min:
    node: tree node to start from

    if node->left == NULL:
        return node

    return min(node->left)

max:
    node: tree node to start from

    if node->right == NULL:
        return node

    return min(node->right)

predecessor:
    item: item to find predecessor of

    if item->left is not NULL:
        return max(item->left)

    return first parent of item that is less than item

successor:
    item: item to find successor of

    if item->right is not NULL:
        return min(item->right)

    return first parent of item that is greater than item

select:
    node: tree node to start from
    index: index of the item to select

    left = number of items in left subtree

    if left equals index:
        return node

    if index is less than left:
        return select(node->left, index)

    return select(node->right, index - left - 1)

rank:
    node: tree node to start from
    item: item to find rank for
    offset: accumulative offset of the index

    left = number of items in left subtree

    if node equals item:
        return left + offset

    if node < item:
        return rank(node->left, item, offset)

    return rank(node->right, item, offset + left + 1)
    

The delete operation in the pseudo code above can be a bit tricky to understand without a visual. There are three possible paths a delete operation can take:

  1. The node to delete has no children (leaf node)
  2. The node to delete has a single child
  3. The node to delete has two children

The image below depicts each path graphically.

Binary Tree Delete

Source Code

Full Repo

Relevant Files:

Click here for build and run instructions

Advantages

  • Search: Optimized for quick search operations
  • Insert\Delete: Although slower then linked list, considerably faster than arrays.
  • Utility: The most utility of any data other list data structure

Disadvantages

  • Memory: Each item maintains two additional pointers. The total size of a binary tree is the sum of the items plus the size of the pointers times the number of items.
  • Cache Optimized: It is possible to create a binary tree with poor spatial locality which can have profound performance implications. See the Memory Cache Hierarchy section for more details.
  • Maintains Order: The order in which items are inserted is lost
  • Complexity: The most complex list data structure.
  1. While this is technically only a linear run time, array insert/delete operations require reallocating larger/smaller chunks of memory and rearranging existing data. Therefore, the constant factors hidden by big O notation are significant.