You are on page 1of 15

Data Structures:Trees

A tree is a fundamental data structure used in computer science. The tree is a


useful data structure for rapidly storing sorted data and rapidly retrieving stored
data.A formal definition of a tree is usually stated as follows: A (general)
tree consists of a set of nodes that is either empty or has a root node to which
are attached zero or more subtrees. Of course, a subtree itself must be a tree.
Thus this is a recursive definition. There is no left to right ordering of the
subtrees.

An Example:

Basic terminology:

The root node (or simply root) is the node at the top of the tree diagram. In the
case above it is the one containing 110.

The parent of a node is the one directly connected to it from above. In the
example, 111 is the parent of 350, 230 is the parent of 310, 110 has no parent,
etc. A child is a node connected directly below the starting node. Thus 350 is a
child of 111, 310 is a child of 230, etc. It is simply the reverse of the parent
relationship. Nodes with the same parent are called siblings. So, 221, 230, and
350 are siblings in our example.

An ancestor of a given node is either the parent, the parent of the parent, the
parent of that, etc. In our example 110 is an ancestor of all of the other nodes in
the tree. The counterpart of ancestor is descendant. For example, 310 is a
descendant of 111, but 310 is not a descendant of 221.

The leaves of a tree (also called external nodes) are those nodes with no
children. In our example, 221, 350, 330, and 310 are leaves. Leaves do not have
to be at the lowest level in the tree. The other nodes of the tree are called non-
leaves (or sometimes internal nodes).

Each node of a tree can be considered to be the root of a subtree consisting of


that node and all its descendants. For example, the subtree rooted at 230 is as
follows:
A branch is a sequence of nodes such that the first is the parent of the second,
the second is the parent of the third, etc. For example, in the above tree, the
sequence 111, 230, 310 is a branch. The length of a branch is the number of line
segments traversed (which is one less than the number of nodes). The above
branch has length 2.

The height of a tree is the maximum length of a branch from the root to a leaf.
The above tree has height 3, since the longest possible branch from root to leaf
is either 110, 111, 230, 310 or 110, 111, 230, 330, both of which have length 3.

The binary tree


The binary tree is a fundamental data structure used in computer science. The
binary tree is a useful data structure for rapidly storing sorted data and rapidly
retrieving stored data. A binary tree is composed of parent nodes, or leaves,
each of which stores data and also links to up to two other child nodes (leaves)
which can be visualized spatially as below the first node with one placed to the
left and with one placed to the right. It is the relationship between the leaves
linked to and the linking leaf, also known as the parent node, which makes the
binary tree such an efficient data structure. It is the leaf on the left which has a
lesser key value (ie, the value used to search for a leaf in the tree), and it is the
leaf on the right which has an equal or greater key value. As a result, the leaves
on the farthest left of the tree have the lowest values, whereas the leaves on the
right of the tree have the greatest values. More importantly, as each leaf
connects to two other leaves, it is the beginning of a new, smaller, binary tree.
Due to this nature, it is possible to easily access and insert data in a binary tree
using search and insert functions recursively called on successive leaves.

The typical graphical representation of a binary tree is essentially that of an


upside down tree. It begins with a root node, which contains the original key
value. The root node has two child nodes; each child node might have its own
child nodes. Ideally, the tree would be structured so that it is a perfectly
balanced tree, with each node having the same number of child nodes to its left
and to its right. A perfectly balanced tree allows for the fastest average insertion
of data or retrieval of data. The worst case scenario is a tree in which each node
only has one child node, so it becomes as if it were a linked list in terms of
speed. The typical representation of a binary tree looks like the following:

10

/ \

6 14
/ \ / \

5 8 11 18

The node storing the 10, represented here merely as 10, is the root node, linking
to the left and right child nodes, with the left node storing a lower value than the
parent node, and the node on the right storing a greater value than the parent
node. Notice that if one removed the root node and the right child nodes, that
the node storing the value 6 would be the equivalent a new, smaller, binary tree.
The structure of a binary tree makes the insertion and search functions simple to
implement using recursion. In fact, the two insertion and search functions are
also both very similar. To insert data into a binary tree involves a function
searching for an unused node in the proper position in the tree in which to insert
the key value. The insert function is generally a recursive function that continues
moving down the levels of a binary tree until there is an unused leaf in a position
which follows the rules of placing nodes. The rules are that a lower value should
be to the left of the node, and a greater or equal value should be to the right.
Following the rules, an insert function should check each node to see if it is
empty, if so, it would insert the data to be stored along with the key value (in
most implementations, an empty node will simply be a NULL pointer from a
parent node, so the function would also have to create the node). If the node is
filled already, the insert function should check to see if the key value to be
inserted is less than the key value of the current node, and if so, the insert
function should be recursively called on the left child node, or if the key value to
be inserted is greater than or equal to the key value of the current node the
insert function should be recursively called on the right child node. The search
function works along a similar fashion. It should check to see if the key value of
the current node is the value to be searched. If not, it should check to see if the
value to be searched for is less than the value of the node, in which case it
should be recursively called on the left child node, or if it is greater than the
value of the node, it should be recursively called on the right child node. it is also
necessary to check to ensure that the left or right child node actually exists
before calling the function on the node.
Because binary trees have log (base 2) n layers, the average search time for a
binary tree is log (base 2) n. To fill an entire binary tree, sorted, takes roughly
log (base 2) n * n. Lets take a look at the necessary code for a simple
implementation of a binary tree. First, it is necessary to have a struct, or class,
defined as a node.

struct node

int key_value;

node *left;

node *right;

};
The struct has the ability to store the key_value and contains the two child nodes
which define the node as part of a tree. In fact, the node itself is very similar to
the node in a linked list. A basic knowledge of the code for a linked list will be
very helpful in understanding the techniques of binary trees. Essentially, pointers
are necessary to allow the arbitrary creation of new nodes in the tree.
It is most logical to create a binary tree class to encapsulate the workings of the
tree into a single area, and also making it reusable. The class will contain
functions to insert data into the tree and to search for data. Due to the use of
pointers, it is necessary to include a function to delete the tree in order to
conserve memory after the program has finished.
class btree
{
public:
btree();
~btree();

void insert(int key);


node *search(int key);
void destroy_tree();

private:
void destroy_tree(node *leaf);
void insert(int key, node *leaf);
node *search(int key, node *leaf);

node *root;
};

The insert and search functions that are public members of the class are
designed to allow the user of the class to use the class without dealing with the
underlying design. The insert and search functions which will be called
recursively are the ones which contain two parameters, allowing them to travel
down the tree. The destroy_tree function without arguments is a front for the
destroy_tree function which will recursively destroy the tree, node by node, from
the bottom up.
The code for the class would look similar to the following:
btree::btree()
{
root=NULL;
}
It is necessary to initialize root to NULL for the later functions to be able to
recognize that it does not exist.
btree::~btree()
{
destroy_tree();
}
The destroy_tree function will set off the recursive function destroy_tree shown
below which will actually delete all nodes of the tree.
void btree::destroy_tree(node *leaf)
{
if(leaf!=NULL)
{
destroy_tree(leaf->left);
destroy_tree(leaf->right);
delete leaf;
}
}

The function destroy_tree goes to the bottom of each part of the tree, that is,
searching while there is a non-null node, deletes that leaf, and then it works its
way back up. The function deletes the leftmost node, then the right child node
from the leftmost node's parent node, then it deletes the parent node, then
works its way back to deleting the other child node of the parent of the node it
just deleted, and it continues this deletion working its way up to the node of the
tree upon which delete_tree was originally called. In the example tree above, the
order of deletion of nodes would be 5 8 6 11 18 14 10. it is necessary to delete
all the child nodes to avoid wasting memory.

void btree::insert(int key, node *leaf)


{
if(key< leaf->key_value)
{
if(leaf->left!=NULL)
insert(key, leaf->left);
else
{
leaf->left=new node;
leaf->left->key_value=key;
leaf->left->left=NULL; //Sets the left child of the child node to null
leaf->left->right=NULL; //Sets the right child of the child node to null
}
}
else if(key>=leaf->key_value)
{
if(leaf->right!=NULL)
insert(key, leaf->right);
else
{
leaf->right=new node;
leaf->right->key_value=key;
leaf->right->left=NULL; //Sets the left child of the child node to null
leaf->right->right=NULL; //Sets the right child of the child node to null
}
}
}
The case where the root node is still NULL will be taken care of by the insert
function that is nonrecursive and available to non-members of the class. The
insert function searches, moving down the tree of children nodes, following the
prescribed rules, left for a lower value to be inserted and right for a greater
value, until it finds an empty node which it creates using the 'new' keyword and
initializes with the key value while setting the new node's child node pointers to
NULL. After creating the new node, the insert function will no longer call itself.

node *btree::search(int key, node *leaf)


{
if(leaf!=NULL)
{
if(key==leaf->key_value)
return leaf;
if(key<leaf->key_value)
return search(key, leaf->left);
else
return search(key, leaf->right);
}
else return NULL;
}

The search function shown above recursively moves down the tree until it either
reaches a node with a key value equal to the value for which the function is
searching or until the function reaches an uninitialized node, meaning that the
value being searched for is not stored in the binary tree. It returns a pointer to
the node to the previous instance of the function which called it, handing the
pointer back up to the search function accessible outside the class.
void btree::insert(int key)
{
if(root!=NULL)
insert(key, root);
else
{
root=new node;
root->key_value=key;
root->left=NULL;
root->right=NULL;
}
}

The public version of the insert function takes care of the case where the root
has not been initialized by allocating the memory for it and setting both child
nodes to NULL and setting the key_value to the value to be inserted. If the root
node already exists, insert is called with the root node as the initial node of the
function, and the recursive insert function takes over.
node *btree::search(int key)
{
return search(key, root);
}

The public version of the search function is used to set off the search recursion
at the root node, keeping it from being necessary for the user to have access to
the root node.
void btree::destroy_tree()
{
destroy_tree(root);
}
The public version of the destroy tree function is merely used to initialize the
recursive destroy_tree function which then deletes all the nodes of the tree.

Binary search tree

A binary search tree is a binary tree in which the data in the nodes is ordered
in a particular way. To be precise, starting at any given node, the data in any
nodes of its left subtree must all be less than the item in the given node, and the
data in any nodes of its right subtree must be greater than or equal to the data
in the given node. Of course, all of this implies that the data items can be
ordered by some sort of less than relationship. For numbers this can obviously be
done. For strings, alphabetical ordering is often used. For records of data, a
comparison based on a particular field (the key field) is often used.

The following is a binary search tree where each node contains a person's name.
Only first names are used in order to keep the example simple. Note that the
names are ordered alphabetically so that DAVE comes before DAWN, DAVID
comes before DAWN but after DAVE, etc.

Creating a binary search tree: One way to do so is by starting with an empty


binary search tree and adding the data items one by one. The first item becomes
the root. The next item is placed in either a left child or right child node,
depending on the ordering. The third item is compared to the root, we go left or
right depending on the result of the comparison, etc. until we find the spot for it.
In each case we follow a path from the root to the spot where we insert the new
item, comparing the new item to each item in the nodes encountered along the
way, going left or right at each node so as to maintain the ordering prescribed
above.

Traversals of Binary Trees:

There are several well-known ways to traverse, to travel throughout, a binary


tree. We will look at three of them. The first is an inorder traversal. This
consists of three overall steps: traverse the left subtree (recursively), visit the
root node, and traverse the right subtree (recursively). When we "visit" a node
we typically do some processing on it, such as printing out the contents of the
node. For example, an inorder traversal of the above binary search tree gives us
the names in this order:

BETH

CINDI

DAVE

DAVI
D

DAW
N

GINA

MIKE

PAT
SUE

Note that we got the data back in ascending order. This will always happen when
doing an inorder traversal of a binary search tree. In fact, a sort can be done this
way. One first inserts the data into a binary search tree and then does an inorder
traversal to obtain the data in ascending order. Some people call this a tree
sort.

Now, how exactly did we get the above list of data? Essentially, we did so by
following the recursive definition for an inorder traversal. First we traverse the
left subtree of the root, DAWN. That left subtree is the one rooted at DAVE. How
do we traverse it? By using the same three-step process. We first traverse its left
subtree, the one rooted at BETH. Of course, we then have to go through the
three steps on the subtree rooted at BETH. We begin by traversing its left
subtree, but it is empty, so we visit the root, BETH. That is the first data item
printed. Then we traverse the right subtree, the one rooted at CINDI. We use the
three-step process on it, but since its subtrees are empty, we simply print the
root, CINDI, which is the second item printed. We then back up to where we left
off with the subtree rooted at DAVE. We have now traversed its left subtree, so
we go on to print the root, DAVE, and then traverse the right subtree. Since the
right subtree itself has empty subtrees, we end up just printing its root, DAVID.
We continue in a similar fashion for the rest of this binary search tree.

The other two traversals that we will study are the preorder traversal and
the postorder traversal. They are very similar to the inorder traversal in that
they consist of the same three steps, but reordered slightly. The preorder
traversal puts the step of visiting the root first. The postorder traversal puts the
step of visiting the root last. Everything else stays the same. Here is an outline of
the steps for all three of our traversals.

Preorder traversal

Visit the root

Traverse the left


subtree

Traverse the right


subtree

Inorder traversal

Traverse the left


subtree

Visit the root

Traverse the right


subtree

Postorder traversal

Traverse the left


subtree

Traverse the right


subtree

Visit the root

For example, doing a postorder traversal of the binary expression tree for 4 * 5
- 3 . The tree is as shown:

First we traverse the left subtree of the whole binary tree. This is the subtree
rooted at *. To do so, we apply our three steps. We traverse its left subtree,
which results in printing 4. Then we traverse the right subtree, which results in
printing 5. Then we visit the root, printing *. Next, we back up to where we left
off with the whole binary tree. We have now traversed the left subtree, so we
traverse the right subtree, printing 3. Then we visit the root, printing -. Overall
we end up printing 4 5 * 3 -, the postfix form of the expression. A postfix
expression is deciphered by looking at it left to right and using the fact that each
operator (such as *) applies to the two previous values. a postorder traversal of a
binary expression tree yields the postfix form of the expression.

A preorder traversal of a binary expression tree always gives the prefix form of
the expression.

The natural conjecture, then, would be that the inorder traversal of a binary
expression tree would produce the infix form of the expression, but that is not
quite true. With the above expression it is true. However, try the infix
expression (12 - 3) * 2. Here parentheses are used to indicate that the
subtraction should be done before the multiplication. The binary expression tree
looks like this:

An inorder traversal of this binary expression tree produces 12 - 3 * 2, which is


the infix form of a slightly different expression, one in which the multiplication is
done before the subtraction. The problem is that we do not get the parentheses
back. It is possible to modify the code for an inorder traversal so that it always
parenthesizes things, but a plain inorder traversal does not give any
parentheses.

Uses of a Binary Search Tree

A binary search tree can be a very useful data structure. it can be used to create
a sort routine. Such a sort routine is normally pretty fast. In fact, it is Theta(n *
lg(n)) on the average. However, it does have a bad worst case, namely when
the data is already in ascending or descending order.

Another use of a binary search tree is in storing data items for fast lookup later.
In the average case it is pretty fast to insert a new item into a binary tree,
because in the average case the data is fairly random and the binary tree is
reasonably "bushy". (In such a tree it is known that the height of the binary tree
is Theta(lg(n)), so that insertion is a Theta(lg(n)) operation.) Similarly, doing a
lookup of an item already in the binary tree follows the same pattern as used
when it was inserted. Thus lookup is Theta(lg(n)) on average.

For example, to look up GINA in the binary tree above, one compares GINA to
DAWN, the root. Since GINA is larger, move to the right child, MIKE. Now
compare GINA to MIKE. Since GINA is smaller, move to the left child GINA. Now
compare GINA to the item in the node, also GINA, and we see that we have a
match. All lookups are like this. One starts at the root and follows a path from the
root to the matching item (or to a leaf if no match is ever found).

AVL Trees
An AVL tree is a special type of binary tree that is always "partially" balanced.
The criteria that is used to determine the "level" of "balanced-ness" is the
difference between the heights of subtrees of a root in the tree.

The "height" of tree is the "number of levels" in the tree. Or to be more formal,
the height of a tree is defined as follows:
1. The height of a tree with no elements is 0
2. The height of a tree with 1 element is 1
3. The height of a tree with > 1 element is equal to 1 + the height of its tallest
subtree.
An AVL tree is a binary tree in which the difference between the height of the
right and left subtrees (or the root node) is never more than one.

The idea behind maintaining the "AVL-ness" of an AVL tree is that whenever we
insert or delete an item, if we have "violated" the "AVL-ness" of the tree in
anyway, we must then restore it by performing a set of manipulations (called
"rotations") on the tree.

These rotations come in two : single rotations and double rotations .

These are self-adjusting, height-balanced binary search trees and are named
after the inventors: Adelson-Velskii and Landis. A balanced binary search tree
has Theta(lg n) height and hence Theta(lg n) worst case lookup and insertion
times. However, ordinary binary search trees have a bad worst case. When
sorted data is inserted, the binary search tree is very unbalanced, essentially
more of a linear list, with Theta(n) height and thus Theta(n) worst case insertion
and lookup times. AVL trees overcome this problem. itions

An AVL tree is a binary search tree in which every node is height balanced, that
is, the difference in the heights of its two subtrees is at most 1. The balance
factor of a node is the height of its right subtree minus the height of its left
subtree. An equivalent definition, then, for an AVL tree is that it is a binary
search tree in which each node has a balance factor of -1, 0, or +1. Note that a
balance factor of -1 means that the subtree is left-heavy, and a balance factor of
+1 means that the subtree is right-heavy. For example, in the following AVL tree,
note that the root node with balance factor +1 has a right subtree of height 1
more than the height of the left subtree. (The balance factors are shown at the
top of each node.)
+1
30

/ \

-1 0
22 62

/ / \

0 +1 -1
5 44 95

\ /
0 0
51 77
The idea is that an AVL tree is close to being completely balanced. Hence it
should have Theta(lg n) height (it does - always) and so have Theta(lg n) worst
case insertion and lookup times. An AVL tree does not have a bad worst case,
like a binary search tree which can become very unbalanced and give Theta(n)
worst case lookup and insertion times. The following binary search tree is not an
AVL tree. Notice the balance factor of -2 at node 70.
-1
100

/ \
-2 -1
70 150

/ \ / \

+1 0 +1 0
30 80 130 180

/ \ \
0 -1 0
10 40 140
/
0
36

INSERTING A NEW ITEM

Initially, a new item is inserted just as in a binary search tree. Note that the item always goes into
a new leaf. The tree is then readjusted as needed in order to maintain it as an AVL tree. There
are three main cases to consider when inserting a new node.

CASE1:

A node with balance factor 0 changes to +1 or -1 when a new node is inserted below it. No
change is needed at this node. Consider the following example. Note that after an insertion one
only needs to check the balances along the path from the new leaf to the root.

CASE2:

A node with balance factor -1 changes to 0 when a new node is inserted in its right subtree.
(Similarly for +1 changing to 0 when inserting in the left subtree.) No change is needed at this
node.

CASE3:

A node with balance factor -1 changes to -2 when a new node is inserted in its left subtree.
(Similarly for +1 changing to +2 when inserting in the right subtree.) Change is needed at this
node. The tree is restored to an AVL tree by using a rotation.

SUBCASE A:

This consists of the following situation, where P denotes the parent of the subtree being
examined, LC is P's left child, and X is the new node added. Note that inserting X makes P have
a balance factor of -2 and LC have a balance factor of -1. The -2 must be fixed. This is
accomplished by doing a right rotation at P. Note that rotations do not mess up the order of the
nodes given in an inorder traversal. This is very important since it means that we still have a
legitimate binary search tree. (the mirror image situation is also included under subcase A.)

(rest of tree)
|
-2
P

/ \

-1 sub
LC tree
of
/ \ height
n
sub sub
tree tree
of of
height height
n n
/
X

The fix is to use a single right rotation at node P. (In the mirror image case a single left
rotation is used at P.) This gives the following picture.

(rest of tree)
|
0
LC

/ \

sub P
tree
of / \
height
n sub sub
/ tree tree
X of of
height height
n n

Recall that the mirror image situation is also included under subcase A. The following is a
general illustration of this situation. The fix is to use a single left rotation at P. See if you can
draw a picture of the following after the left rotation at P. Then draw a picture of a particular
example that fits our general picture below and fix it with a left rotation.

(rest of tree)
|
+2
P

/ \

sub +1
tree RC
of
height / \
n
sub sub
tree tree
of of
height height
n n
\
X

SUBCASE B:
This consists of the following situation, where P denotes the parent of the subtree being
examined, LC is P's left child, NP is the node that will be the new parent, and X is the new node
added. X might be added to either of the subtrees of height n-1. Note that inserting X makes P
have a balance factor of -2 and LC have a balance factor of +1. The -2 must be fixed. This is
accomplished by doing a double rotation at P (explained below). (Note that the mirror image
situation is also included under subcase B.)

(rest of tree)
|
-2
P

/ \

+1 sub
LC tree
of
/ \ height
n
sub -1
tree NP
of / \
height sub sub
n tree tree
n-1 n-1
/
X

The fix is to use a double right rotation at node P. A double right rotation at P consists of a
single left rotation at LC followed by a single right rotation at P. (In the mirror image case a
double left rotation is used at P. This consists of a single right rotation at the right child RC
followed by a single left rotation at P.) In the above picture, the double rotation gives the
following (where we first show the result of the left rotation at LC, then a new picture for the
result of the right rotation at P).

(rest of tree)
|
-2
P

/ \

-2 sub
NP tree
of
/ \ height
n
0 sub
LC tree
/ \ n-1
sub sub
tree tree
of n-1
height /
n X

Finally we have the following picture after doing the right rotation at P.
(rest of tree)
|
0
NP

/ \

0 +1
LC P

/ \ / \

sub sub sub sub


tree tree tree tree
of n-1 n-1 of
height / height
n X n

You might also like