You are on page 1of 2

Data Structures - Cheat Sheet

Trees Binary

Red-Black Tree Melding: If the heap is represented by an array, link the two
arrays together and Heapify-Up. O (n).
1. Red Rule: A red child must have a black father
2. Black Rule: All paths to external nodes pass through the
Binomial
same number of black nodes.
3. All the leaves are black, and the sky is grey. Melding: Unify trees by rank like binary summation. O (log n)
Rotations are terminal cases. Only happen once per fixup.
If we have a series of insert-delete for which the insertion point Fibonacci Heap
is known, the amortized cost to each action is O (n). √
  (1+ 5)
Height:log n ≤ h ≤ 2 log n Maximum degree: D (n) ≤ logϕ n ; ϕ = 2
Limit of rotations: 2 per insert. Minimum size of degree k: sk ≥ Fk+2
Bound of ratios between two branches L, R: S (R) ≤ (S (L))
2 Marking: Every node which lost one child is marked.
Completely isomorphic to 2-4 Trees. Cascading Cut: Cut every marked node climbing upwards.
Keeps amortized O(log n) time for deleteMin. Otherwise

B-Tree O( n).
Proof of the ϕk node size bound:
d defines the minimum number of keys on a node 1. All subtrees of junction j, sorted by order of insertion are of
Height: h ≈ logd n degree D[si ] ≥ i − 2 (Proof: when x’s largest subtree was added,
1. Every node has at most d children and at least d2 children since D [x] was i − 1, so was the subtree. Since then, it could
(root excluded). lose only one child, so it is at least i − 2)
2. The root has at least 2 children if it isn’t a leaf. Pk
2. Fk+2 = 1 + i=0 Fi ; Fk+2 ≥ ϕk
3. A non-leaf node with k children contains k − 1 keys. 3. If x is a node and k = deg [x], Sx ≥ Fk+2 ≥ ϕk .
4. On B+ trees, leaves appear at the same level. (Proof: Assume induction after the base cases and then sk =
5. Nodes at each level form linked lists Pk Pk Pk
2 + i=2 Si−2 ≥ 2 + i=2 Fi = 1 + i=0 Fi = Fk+2 )
d is optimized for HDD/cache block size
Insert: Add to insertion point. If the node gets too large, Structures
split.O (log n) ≤ O (logd n)
Split: The middle of the node (low median) moves up to be the Median Heap: one min-heap and one max-heap with ∀x ∈
edge of the father node. O (d) min, y ∈ max : x > y then the minimum is on the median-heap
Delete: If the key is not in a leaf, switch with succ/pred. Delete,
and deal with short node v:
1. If v is the root, discard; terminate.
Sorting
2. If v has a non-short sibling, steal from it; terminate. Comparables
3. Fuse v with its sibling, repeat with p ← p [v].
Algorithm Expected Worst Storage

Traversals QuickSort O (n log n) O n2 In-Place
Partition recursively at each step.

BubbleSort O n2 In-Place
Traverse(t): 
SelectionSort O n2 In-Place
if t==null then return
Traverse n slots keeping score of the
→ print (t) //pre-order
Traverse(t.left) maximum. Swap it with A [n]. Repeat
→ (OR) print(t) //in-order for A [n − 1] .
Traverse(t.right) HeapSort O (n log n) Aux
→ (OR) print(t) //post-order InsertionSort Aux
MergeSort O (n log n) Aux

Heaps Linear Time


Binary Binomial Fibonacci BucketSort Θ (n):
findMin Θ(1) Θ(1) Θ(1) If the range is known, make the appropriate number of buckets,
deleteMin Θ(log n) Θ(log n) O(log n) then:
insert Θ(log n) O(log n) Θ(1) 1. Scatter: Go over the original array, putting each object in its
decreaseKey Θ(log n) Θ(log n) Θ(1) bucket.
meld Θ(n) Θ(log n) Θ(1) 2. Sort each non-empty bucket (recursively or otherwise)
1
3. Gather: Visit the buckets in order and put all elements back Two-Level Hashing
into the original array.  
Pn−1 ni
CountSort Θ (n): The number of collisions per level: i=0 = |Col|
2
1. Given an array A bounded in the discrete range C, initialize 1. Choose m = n and h such that |Col| < n.
an array with that size. 2. Store the ni elements hashed to i in a small table of size n2i
2. Passing through A, increment every occurence of a number i using a perfect hash function hi .
in its proper slot in C. Random algorithm for constructing a perfect two level
3. Passing through C, add the number represented by i into A hash table:
a total of C [i] times. 1. Choose a random h from H(n) and compute the number of
RadixSort Θ (n): collisions. If there are more than n collisions, repeat.
1. Take the least significant digit. 2. For each cell i,if ni > 1, choose a random hash function from
2. Group the keys based on that digit, but otherwise keep the H(ni2). If there are any collisions, repeat.
original order of keys. (This is what makes the LSD radix sort Expected construction time – O(n)
a stable sort). Worst Case search time - O (1)
3. Repeat the grouping process with each more significant digit.

Union-Find
Selection
MakeSet(x) Union(x, y) Find(x)

QuickSelect O (n) O n2 O (1) O (1) O (α (k))
5-tuple Select Union by Rank: The larger tree remains the master tree in
every union.
Path Compression: every find operation first finds the master
Hashing root, then repeats its walk to change the subroots.
Universal Family: a family of mappings H. ∀h ∈ H. h : U →
1
[m] is universal iff ∀k1 6= k2 ∈ U : P rh∈H [h(k1 ) = h(k2 )] ≤ m Recursion
Example: If U = [p] = {0, 1, . . . , p − 1}then Hp,m =
Master Theorem: for T (n) = aT nb + f (n) ; a ≥ 1, b >

{ha,b | 1 ≤ a ≤ p; 0 ≤ b ≤ p} and every hash function is
ha,b (k) = ((ak + b) mod (p)) mod (m)  ε > 0:
1,
 
Linear Probing: Search in incremental order through the table 

 T (n) = Θ nlogb a f (n) = O nlogb (a)−ε
    
from h (x) until a vacancy is found. T (n) = Θ nlogb a logk+1 n

f (n) = Θ nlogb a logk n ; k ≥ 0
Open Addressing: Use h1 (x) to hash and h2 (x)to permute. 
 f (n) = Ω nlogb a+ε
No pointers.

T (n) = Θ (f (n))


af nb ≥ cf (n)
 
Open Hashing:
Perfect Hash: When one function clashes, try another. O (∞). Building a recursion tree: build one tree for running times (at
Load Factor α: The length of a possible collision chain. When T (αn)) and one for f (n).
|U | = n, α = mn.

Orders of Growth
Methods f x→∞
f = O (g) lim supx→∞ fg < ∞ f = o(g) g → 0
Modular: Multipilicative, Additive, Tabular(byte)-additive f = Θ (g) limx→∞ fg ∈ R+
f x→∞
f = Ω (g) lim inf x→∞ fg > 0 f = ω(g) g → ∞
Performance
Chaining E [X] Worst Case Amortized Analysis
1
Successful Search/Del 2 (1 + α) n
Potential Method: Set Φ to examine a parameter on data
Failed Search/Verified Insert 1+α n
stucture Di where i indexes the state of the structure. If ci is
the actual cost of action i, then cˆi = ci + Φ (Di ) − Φ (Di−1 ).
Probing Pn
The total potential amortized cost will then be
Pn Pn i=1 cˆi =
Linear: h (k, i) = (h0 (k) + i) mod m (c
i=1 i + Φ (D i ) − Φ (Di−1 )) = c
i=1 i + Φ (D i ) − Φ (D0 )
Quadratic:h k, i) = h0 (k) + c1 i + c2 i2 mod m Deterministic algorithm: Always predictable.

√ n
Double: h (k, i) = (h1 (k) + ih2 (k)) mod m Stirling’s Approximation: n! ∼ 2πn ne ⇒ log n! ∼
E [X] Unsuccessful Search Successful Search x log x − x
1 1
Uni. Probing ln 1
 1−α  2  α 1−α 
1 1 1 1
Lin. Probing 2 1 + 1−α 2 1 + 1−α
So Linear Probing is slightly worse but better for cache.
Collision Expectation: P [X ≤ 2E [X]] ≥ 12
So:
Scribbled by Omer Shapira, based on the course “Data Structures” at Tel Aviv
1. if m = n then E [|Col| < n] ≥ n2 University.
2. if m = n2 then E [|Col| < 1] ≥ 12 And with 2 there are no Redistribute freely.
collisions. Website: http://www.omershapira.com

You might also like