Just like arrays and linked lists, binary trees are another list data structure (technically, a graph data structure but graph concepts aren’t germane to this topic). They are the most complex list data structure, so the reader is highly encouraged to thoroughly examine the provided pseudo and source code.

As the name implies, binary trees arrange data in a tree like structure with a
root, branches, and leaves. Each node in a binary tree has a left and right
pointer. The left pointer points to a node with a *lesser valued* item and the
right pointer points to a node with a *greater valued* item. Furthermore, every
descendant to the left of a given node has a lesser value and likewise every
descendant to the right has a greater value. The root node can be any node in
the structure and acts as the entry point. The image below depicts a typical
binary tree.

#### Binary Tree

An interesting property of binary trees is that there are multiple valid ways to rearrange data. For instance, consider the binary trees depicted below. Both trees are valid, and contain the same data.

#### Same Data, Different Valid Arrangements

Binary trees provide all the same utility as Sorted Arrays with the added bonus of performant insert and deletes. The table below compares the asymptotic complexity for operations on the two types of data structures. As is shown, sorted arrays do have slightly faster run times for some operations. However, if insert and delete is a requirement binary trees are always a better option.

Operation | Sorted Array | Binary Tree |
---|---|---|

Insert\Delete | $O(n)$^{1} |
$O(\log_{2}n)$ |

Search | $O(\log_{2}n)$ | $O(\log_{2}n)$ |

Enumerate | $O(n)$ | $O(n+\log_{2}n)$ |

Min\Max | $O(1)$ | $O(\log_{2}n)$ |

Predecessor\Successor | $O(\log_{2}n)$ | $O(\log_{2}n)$ |

Select | $O(1)$ | $O(\log_{2}n)$ |

Rank | $O(\log_{2}n)$ | $O(\log_{2}n)$ |

^{1}While this is technically only a liner run time, array insert/delete operations require reallocating larger/smaller chunks of memory and rearranging existing data. Therefore, the constant factors hidden by big O notation are significant.

As a side note, a more appropriate name may be *inverted* tree because they look
like an upside down tree. Alas, we are bound by convention. Remember the words
of Bertrand Russell, “Conventional people are roused to fury by departure from
convention, largely because they regard such departure as a criticism of
themselves.”

## Applications

## Asymptotic Complexity

- Insert\Delete: $O(\log_{2}n)$
- Search: $O(\log_{2}n)$
- Enumerate: $O(n + \log_{2}n)$
- Min: $O(\log_{2}n)$
- Max: $O(\log_{2}n)$
- Predecessor: $O(\log_{2}n)$
- Successor: $O(\log_{2}n)$
- Select: $O(\log_{2}n)$
- Rank: $O(\log_{2}n)$

## Pseudo Code

```
insert:
node: tree node to start from
item: item to insert
if item is less than node:
if node->left is NULL:
node->left = item
return
else
insert(node->left, item)
else
if node->right is NULL:
node->right = item
return
else
InsertNode(node->right, item)
delete:
node: tree node to start from
item: item to insert - item will have exactly 0, 1, or 2 children
if item has 0 children:
delete pointer to item in parent node
delete item
if item has 1 child:
update item->parent to point to child instead of item
update child->parent to item->parent
delete item
// item has 2 children
largest_left = largest item in the left subtree
delete largest_left from tree
replace item with largest_left
search:
node: tree node to start from
item: item to insert - item will have exactly 0, 1, or 2 children
if item equal node:
return node
if item is less than node:
search(node->left, item)
// item must be greater than b/c it's not equal or less
search(node->right, item)
enumerate:
node: tree node to start from
enumerate(node->left)
output node
enumerate(node->right)
min:
node: tree node to start from
if node->left == NULL:
return node
return min(node->left)
max:
node: tree node to start from
if node->right == NULL:
return node
return min(node->right)
predecessor:
item: item to find predecessor of
if item->left is not NULL:
return max(item->left)
return first parent of item that is less than item
successor:
item: item to find successor of
if item->right is not NULL:
return min(item->right)
return first parent of item that is greater than item
select:
node: tree node to start from
index: index of the item to select
left = number of items in left subtree
if left equals index:
return node
if index is less than left:
return select(node->left, index)
return select(node->right, index - left - 1)
rank:
node: tree node to start from
item: item to find rank for
offset: accumulative offset of the index
left = number of items in left subtree
if node equals item:
return left + offset
if node < item:
return rank(node->left, item, offset)
return rank(node->right, item, offset + left + 1)
```

The delete operation in the pseudo code above can be a bit tricky to understand without a visual. There are three possible paths a delete operation can take:

- The node to delete has no children (leaf node)
- The node to delete has a single child
- The node to delete has two children

The image below depicts each path graphically.

## Source Code

Relevant Files:

Click here for build and run instructions

## Advantages

*Search*: Optimized for quick search operations*Insert\Delete*: Although slower then linked list, considerably faster than arrays.*Utility*: The most utility of any data other list data structure

## Disadvantages

*Memory*: Each item maintains two additional pointers. The total size of a binary tree is the sum of the items plus the size of the pointers times the number of items.*Cache Optimized*: It is possible to create a binary tree with poor spatial locality which can have profound performance implications. See the Memory Cache Hierarchy section for more details.*Maintains Order*: The order in which items are inserted is lost*Complexity*: The most complex list data structure.