You are on page 1of 44

CS61BL Summer 2016

Lecture 4
Trees
Trees
A tree consists of:

● A set of nodes.
● A set of edges which connect pairs of nodes.

Requirements:

● Connected - there is a path from any node to another node along the edges.
● If there are N nodes, there are N-1 edges.

Can be rephrased as: there is exactly one path between each pair of nodes.
Common Example: File Folders
/home/alanyao

games/ images/ downloads/ school/

sc2/ overwatch/ pdfs/ cs61bl/ old/

lectures/ labs/
Trees
Rooted tree definitions:

● A single node is designated as the root of the tree.


● A node c has a single parent p, unless it is the root, then it has no parent.
● c is the child of p. Each node can have any number of children.
● a is an ancestor of d if it exists on the path from d to the root.
● If a is an ancestor of d, then d is a descendent of a.
● If a node has no children, it is a leaf.
Common Example: File Folders root

/home/alanyao

games/ images/ downloads/ school/

sc2/ overwatch/ pdfs/ cs61bl/ old/

lectures/ labs/
leaves
Trees
Rooted tree metrics:

● The length of a path is the number of edges in the path.


● The depth of a node n is the length of the path from n to the root.
○ The depth of the root is zero.
● The height of a node n is the length of the path from n to its deepest
descendant. The height of a leaf node is zero.
○ The height of a tree is the height of the root.
● The subtree rooted at node n is the tree formed by n and its descendants.
Common Example: File Folders
/home/alanyao Depth:
0

Height:
3 games/ images/ downloads/ school/

sc2/ overwatch/ pdfs/ cs61bl/ old/

Depth:
1
lectures/ labs/
Height:
0
Trees: Traversals: Preorder
class TreeNode { 1

void preorder() {
this.visit(); //do something
2 6
children.forEach(this::preorder);
}
}
3 4 5 7 8

Runtime: ϴ(N), where N is the size of the tree.


Each node is visited only once.

Example: Naturally prints the listing of a directory


tree, if visit just prints a name.
Trees: Traversals: Postorder
class TreeNode { 8

void postorder() {
children.forEach(this::postorder);
4 7
this.visit(); //do something
}
}
1 2 3 5 6

Runtime: ϴ(N), where N is the size of the tree.


Each node is visited only once.

Example: Computes disk space usage.


Subdirectories (children) computed (visited) first.
Binary Search Trees
● A rooted tree, where each node has at most two children.
● Stored keys must be comparable
● The left subtree of a node n has values that are less than n’s key, and the
right subtree has values that are greater than n’s key.

Think of it as an ordered dictionary. A dictionary maps keys to values, like words


and their definitions.

A tree enforces that these keys have a total ordering.


Binary Search Trees: Representation & Inorder
3
class BST<K extends Comparable<K>, V> {
TreeNode root;
class TreeNode {
Entry<K, V> item; // Key displayed, ordered on. 2 6
// Also stores associated value
TreeNode left;
TreeNode right;
1 4 7
void inorder() {
if (left != null) left.inorder();
this.visit();
5 9
if (right != null) right.inorder();
}
} In-Order traversal
} shown above 8
find(5)
Binary Search Trees: Search
3
public V find(K k) {
TreeNode node = root;
while (node != null) {
int comp = 2 6
((Comparable) k).compareTo(node.item.key);
if (comp < 0) {
node = node.left;
} else if (comp > 0) { 1 4 7
node = node.right;
} else { /* The keys are equal */
return node.item.value
5 9
}
}
return null;
} 8
find(5)
Binary Search Trees: Runtime
1
Height of perfectly balanced tree: h = log2(n)
Each operation is dependent on the height. Thus runtime
of each operation O(log2(n)) 2

Worst-case shown on right: O(n)


3

5
Binary Search Trees: Operations
Other typical Map<K, V> operations.

● insert(K, V)
● remove(K)

Ordering specific operations

● min(), max()
● getAllGreaterThan(K), getAllLessThan(K)
● getNthLargest(int n)

Your job to implement some of these!


Hashing
(quick break)
Review: How fast are add and contains?

Data structure add(E element) contains(Object


runtime o) runtime

LinkedList

ArrayList
(resizing array)

TreeSet
(balanced BST)
Review: How fast are add and contains?

Data structure add(E element) contains(Object


runtime o) runtime

LinkedList ϴ(1) ϴ(n)

ArrayList
(resizing array)

TreeSet 2 1 9
(balanced BST)
Review: How fast are add and contains?

Data structure add(E element) contains(Object


runtime o) runtime When adding
2 1 9 3 4, underlying
LinkedList ϴ(1) ϴ(n) array resizes
ArrayList ϴ(1) (due to ϴ(n)
(resizing array) amortization)
2 1 9 3 4
TreeSet
(balanced BST)
Review: How fast are add and contains?

Data structure add(E element) contains(Object 7


runtime o) runtime

LinkedList ϴ(1) ϴ(n)


2 9
ArrayList ϴ(1) (due to ϴ(n)
(resizing array) amortization)

TreeSet ϴ(log n) ϴ(log n)


(balanced BST) 1 4 8 11
Can we do better?
● Specifically, imagine data structure that holds just whole numbers. Can we
make add and contains faster?
Can we do better?
● Specifically, imagine data structure that holds just whole numbers. Can we
make add and contains faster?
● Yes! BooleanSet in lab09 is already faster! ϴ(1) add and contains!

0 1 2 3 4 5 6 7

F T T T T F F F

Problems?
Can we do better?
● Specifically, imagine data structure that holds just whole numbers. Can we
make add and contains faster?
● Yes! BooleanSet in lab09 is already faster! ϴ(1) add and contains!

0 1 2 3 4 5 6 7

F T T T T F F F

Problems!
You’ll need a huge array! Lots of wasted space!

Solution?
Can we do better?
Solution!
● Rather than placing booleans in each array spot, place the number itself.
● Wrap the numbers using modular arithmetic! (e.g. 8 mod 8 ≣ 0)

0 1 2 3 4 5 6 7

8 9 -1 3 12 5 -1 -1

Problems?
Can we do better?
Solution!
● Rather than placing booleans in each array spot, place the number itself.
● Wrap the numbers using modular arithmetic! (e.g. 8 mod 8 ≣ 0)

0 1 2 3 4 5 6 7

8 9 -1 3 12 5 -1 -1

Problems!
What happens if you want to add both 0 and 8 in the array? Collision!

Solution?
Can we do better?
Solution! 0 1 2 3 4 5 6 7

● Make the array hold linked lists of


numbers to handle collisions!
8 9 3 12 5
What happens to runtime?

0 4

This is called external chaining!


Can we do better?
Solution! 0 1 2 3 4 5 6 7

● Make the array hold linked lists of


numbers to handle collisions!
8 9 3 12 5
What happens to runtime?
● ϴ(1) add
0 4
● ϴ(linked list length) contains

Consider adding N items when array size is


constant K. contains will run in ϴ(n). This is called external chaining!

Solution?
Can we do better?
General runtime: 0 1 2 3 4 5 6 7

● ϴ(1) add
● ϴ(linked list length) contains
8 9 3 12 5
Solution!
● Resize array as elements are added
0 4
so that linked list length remains
constant.

What happens to runtime? This is called external chaining!


● ϴ(1) add (due to amortization)
● ϴ(1) contains
Can we do better?
Other considerations: 0 1 2 3 4 5 6 7

● What if we want to store any type of


object, not just numbers?
Pass the object through a hash 8 9 3 12 5

function that assigns a number to it.


● How do we know when to resize the 0 4
underlying array?
Choose a constant called the load
factor that indicates when a resize is This is called external chaining!
necessary.
How does this work in Java? HashSet, HashMap
1. Hash function to assign numbers to objects
● Every Object has a hashCode method that returns an int.
● Implementation of hashCode is provided by the JVM (not specified directly in
language), but it is usually based on the internal address of the object.
● Subclasses of object can overwrite the default implementation to base the
hashCode on instance variables associated with the object.
● Default implementations of hashCode and equals are linked. If you overwrite
one, you should overwrite the other.
● If a.equals(b), then a.hashCode() == b.hashCode().
(Note that the converse is not necessarily true.)
How does this work in Java? HashSet, HashMap
2. Way to map numbers (returned by hash function) to array indices
● In our design example earlier, we used % (mod operator) to “wrap” the
numbers around the array.
● Because hashCode returns an int, this wouldn’t work for Java because the %
operator returns numbers of the same sign (+/-) as the given inputs.
● HashMap’s indexFor uses a very cool bitwise operator to map every number
to a positive index less than the array length. This operation relies on the fact
that the underlying array’s size is always a power of 2 (e.g. 16, 32, 64, etc.).
How does this work in Java? HashSet, HashMap
3. Load factor to determine when to resize underlying array
● The load factor is a fixed ratio of number of elements in the hashtable to the
number of buckets (length of the array).
● Resize is necessary when the number of elements in data structure exceeds
the load factor multiplied with the length of the underlying array.
● HashMap’s load factor is 0.75.
● The initial capacity of the underlying array for a HashMap is 16. How many
elements would we need to add to trigger the first resize?
How does this work in Java? HashSet, HashMap
3. Load factor to determine when to resize underlying array
● The load factor is a fixed ratio of number of elements in the hashtable to the
number of buckets (length of the array).
● Resize is necessary when the number of elements in data structure exceeds
the load factor multiplied with the length of the underlying array.
● HashMap’s load factor is 0.75.
● The initial capacity of the underlying array for a HashMap is 16. How many
elements would we need to add to trigger the first resize? 13
● What is the average length of a linked list (external chain) if a HashMap is
holding the max number of elements without triggering a resize?
How does this work in Java? HashSet, HashMap
3. Load factor to determine when to resize underlying array
● The load factor is a fixed ratio of number of elements in the hashtable to the
number of buckets (length of the array).
● Resize is necessary when the number of elements in data structure exceeds
the load factor multiplied with the length of the underlying array.
● HashMap’s load factor is 0.75.
● The initial capacity of the underlying array for a HashMap is 16. How many
elements would we need to add to trigger the first resize? 13
● What is the average length of a linked list (external chain) if a HashMap is
holding the max number of elements without triggering a resize? 0.75
Important Caveat
● How did we justify the runtime of contains as ϴ(1)?
Important Caveat
● How did we justify the runtime of contains as ϴ(1)?
The average length of an external chain is at most equal to the load factor. If
the load factor is a constant, contains must run in ϴ(1) in the average case.
● What about the worst case runtime?
Important Caveat
● How did we justify the runtime of contains as ϴ(1)?
The average length of an external chain is at most equal to the load factor. If
the load factor is a constant, contains must run in ϴ(1) in the average case.
● What about the worst case runtime?
The worst case runtime of contains would be ϴ(n) if all the items end up in
the same bucket.
● Is it possible for all the items to end up in the same bucket given the fact that
we’re resizing the underlying array?
Important Caveat
● How did we justify the runtime of contains as ϴ(1)?
The average length of an external chain is at most equal to the load factor. If
the load factor is a constant, contains must run in ϴ(1) in the average case.
● What about the worst case runtime?
The worst case runtime of contains would be ϴ(n) if all the items end up in
the same bucket.
● Is it possible for all the items to end up in the same bucket given the fact that
we’re resizing the underlying array?
Yes, it is still possible for all the items to end up in the same bucket if they all
have the same hashCode. This is why it important for hash functions to evenly
distribute elements!
HashSet vs. HashMap
class HashSet<E> implements Set<E>
class HashMap<K,V> implements Map<K,V>

● Recall that the Set interface is a subinterface of Collection.


● The Map interface represents another major category of abstract data type. It
associated keys with values (like a Python dictionary).
● HashSet’s underlying implementation just uses a HashMap (sets elements as
keys without associating them with values).
Joey’s Advice on Choosing Data Structures
You can almost 100% pin down your data structure with the following questions:

1. Do I need a set, map, or list?


2. Do I need mutability? (default to no)
3. Do I need it to be sorted? (default to no)
4. Do I need concurrency? (default to no)
● It's safer/more efficient to go with immutable data structures if you can.
● Sorted structures are slightly slower but allow for extra operations (like
selecting all items within a range). (These operations are prohibitively slow for
unsorted structures.)
● It's faster to go with non-concurrent data structures if you can.
Amortized Analysis
● Allows us to analyze the runtime of operations that are typically fast, but
sometimes more expensive, like array resizing.
● Use $1 = 1 unit of run time
● Assign to each operation:
○ Amortized cost: The amount we charge, in dollars, for that operation. The amortized runtime.
○ Actual cost: The amount of time spent by the operation, in dollars.
● When amortized cost > actual cost, save extra leftovers in the bank.
● Otherwise, take money out of the bank.
● The bank can never be negative!
Amortized Analysis: Analogy
● Your parent(s) gives you $10 for food every time you go out.
● You buy food, but because you’re cheap, you usually don’t spend the full $10.
Maybe you just get boba instead.
○ And then you save the remaining money.
● Every now and then, you splurge and go somewhere nice, like Kiraku or Chez
Panisse.
● In the end, you spend approximately all your money, so you do spent
$10/meal on average, but it’s spread out amongst and covers your other
expenses.
Amortized Analysis: Resizing
● Assign insert() to cost $5.
○ Insert without resizing takes $1, leaving an extra $4.
● It takes at least N/2 inserts to cause the array to resize
○ (starting from half full right after a resize)
● Thus at least (N/2 * $4) = $2N in the bank before a resize.
● Resize takes the array from length N to 2N, which is exactly how much money
is in the bank!
● Therefore we never go broke and each insert costs O(5) = ϴ(5)
Project 2 Demo
(quick break)

You might also like